knitr::opts_chunk$set( echo = FALSE, warning = FALSE, message = FALSE)

INTRODUCTION

In this project, it will be predicted some of products that are sold at an e-commerce platform called ‘Trendyol’. The sold count will be examined for each product and data will be decomposed. Then, some forecasting strategies will be developed and the best among them according to their weighted mean absolute errors will be picked. The data before 29 May 2021 will be train dataset for our models to learn and data from 29 May to 11 June 2021 will be test dataset. There are 9 products that it will be examined:

  • 85004 - La Roche Posay Face Cleanser
  • 4066298 - Sleepy Baby Wipes
  • 6676673 - Xiaomi Bluetooth Headphones
  • 7061886 - Fakir Vacuum Cleaner
  • 31515569 - TrendyolMilla Tights
  • 32737302 - TrendyolMilla Bikini Top
  • 32939029 - Oral-B Rechargeable ToothBrush
  • 48740784 - Altınyıldız Classics Jacket
  • 73318567 - TrendyolMilla Bikini Top

Since campaign dates are important for the sales and most peaks in sales happen during these times,as external data, campaign dates of the Trendyol is investigated and included as input attribute ‘is_campaign’. The data is taken from Trendyol’s website.

PRODUCT 1 - La Roche Posay Face Cleanser

Before making forecasting models, it should be looked at the plot of data and examined the seasonalities and trend. Below, you can see the plot of sales quantity of Product 1. There is a slightly increasing trend, especially in the middle of the plot. There can’t be seen any significant seasonality. To look further, there is a plot of 3 months of 2021 - March, April and May -. Again, the seasonality isn’t very significant but it is seen that the data is higher in the beginning of the month and decreases to the end of the month. It can be said that there is monthly seasonality.

Linear Regression Model For Product 1

First type of model that is going to used is linear regression model. First of all, it would be wise to select attributes that will help to model from correlation matrix. Below, you can see the correlations between the attributes. According to this matrix, category_sold, category_favored, and basket_count can be added to the model.

In the first model, the attributes are added to the model. The adjusted R-squared value indicates whether model is good or not. The value for the first model is pretty high which is a good sign. But there are outliers which is probably due to campaigns and holidays. The outliers can be eliminated for a better model. Lastly, ‘lag1’ attribute can be added because it is very high in the ACF. In the final linear regression model, adjusted R-squared value is high enough and plots are good enough to make predictions.

## 
## Call:
## lm(formula = sold_count ~ category_sold + category_favored + 
##     basket_count, data = sold)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -86.278 -11.238  -0.387   8.763 168.980 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       4.7442865  2.8040394   1.692   0.0915 .  
## category_sold     0.1187613  0.0062677  18.948  < 2e-16 ***
## category_favored -0.0015302  0.0002083  -7.347 1.34e-12 ***
## basket_count      0.1407651  0.0090971  15.474  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.46 on 365 degrees of freedom
## Multiple R-squared:  0.8403, Adjusted R-squared:  0.839 
## F-statistic: 640.4 on 3 and 365 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 140.82, df = 10, p-value < 2.2e-16

##    sold_count    
##  Min.   : 14.00  
##  1st Qu.: 33.00  
##  Median : 56.00  
##  Mean   : 74.17  
##  3rd Qu.: 89.00  
##  Max.   :447.00
## 
## Call:
## lm(formula = sold_count ~ big_outlier + category_sold + category_favored + 
##     basket_count, data = sold)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -80.651  -8.335  -1.034   8.277 121.209 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      11.5878617  2.3643596   4.901 1.44e-06 ***
## big_outlier      76.5329182  5.7826657  13.235  < 2e-16 ***
## category_sold     0.0867377  0.0056964  15.227  < 2e-16 ***
## category_favored -0.0008900  0.0001781  -4.998 9.01e-07 ***
## basket_count      0.1075103  0.0078954  13.617  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.95 on 364 degrees of freedom
## Multiple R-squared:  0.8922, Adjusted R-squared:  0.891 
## F-statistic: 753.2 on 4 and 364 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 112.47, df = 10, p-value < 2.2e-16
## 
## Call:
## lm(formula = sold_count ~ lag1 + big_outlier + category_sold + 
##     category_favored + basket_count, data = sold)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -78.630  -7.746  -0.706   7.253 123.997 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       9.9269325  2.0130544   4.931 1.25e-06 ***
## lag1              0.5443102  0.0457488  11.898  < 2e-16 ***
## big_outlier      63.1752763  5.0382831  12.539  < 2e-16 ***
## category_sold     0.0940932  0.0048777  19.290  < 2e-16 ***
## category_favored -0.0009748  0.0001514  -6.438 3.84e-10 ***
## basket_count      0.1106151  0.0067112  16.482  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.79 on 363 degrees of freedom
## Multiple R-squared:  0.9225, Adjusted R-squared:  0.9214 
## F-statistic: 863.6 on 5 and 363 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 18.357, df = 10, p-value = 0.04924

Arima Model For Product 1

Second type of model that is going to build is ARIMA model. For this model, in the beginning, the data should be decomposed. Firstly, a frequency value should be chosen. Since there is no significant seasonality, the highest value in the ACF will be chosen which is 63. Additive type of decomposition will be used for this task. Below, the random series can be seen.

After the decomposition, (p,d,q) values should be chosen for the model. For this task, ACF and PACF will be examined. Looking at the ACF, for ‘q’ value 1 or 7 can be chosen and looking at the PACF, for ‘p’ value 1 can be chosen. Also, auto.arima function is used as well. The AIC and BIC values of models that are suggested can be seen below. Smaller AIC and BIC values means the model is better. So, looking at AIC and BIC values, (2,0,2) model that auto.arima is suggested is best among them. After the model is selected, the regressors that most correlates with the sold count are added to model to make it better. In the final model, the AIC and BIC values are lower. We can proceed with this model.

## 
## Call:
## arima(x = detrend, order = c(1, 0, 1))
## 
## Coefficients:
##          ar1     ma1  intercept
##       0.6650  0.0123    -1.5566
## s.e.  0.0574  0.0702     6.0436
## 
## sigma^2 estimated as 1244:  log likelihood = -1529.77,  aic = 3067.54
## [1] 3067.536
## [1] 3082.443
## 
## Call:
## arima(x = detrend, order = c(1, 0, 7))
## 
## Coefficients:
##          ar1      ma1      ma2      ma3      ma4      ma5      ma6      ma7
##       0.8658  -0.2496  -0.0680  -0.1138  -0.2193  -0.1632  -0.0457  -0.1405
## s.e.  0.0427   0.0696   0.0622   0.0643   0.0589   0.0551   0.0697   0.0702
##       intercept
##         -0.4768
## s.e.     0.5468
## 
## sigma^2 estimated as 1129:  log likelihood = -1516.43,  aic = 3052.87
## [1] 3052.868
## [1] 3090.136
## Series: detrend 
## ARIMA(2,0,2) with zero mean 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2
##       1.5221  -0.6871  -0.8673  0.1966
## s.e.  0.1703   0.0984   0.1811  0.0930
## 
## sigma^2 estimated as 1201:  log likelihood=-1522.43
## AIC=3054.86   AICc=3055.06   BIC=3073.5
## [1] 3054.864
## [1] 3073.498
## 
## Call:
## arima(x = detrend, order = c(2, 0, 2), xreg = xreg)
## 
## Coefficients:
##          ar1      ar2      ma1     ma2  intercept   xreg1   xreg2
##       0.8477  -0.1219  -0.1993  0.1917   -52.5534  0.1673  -2e-04
## s.e.  0.2838   0.2328   0.2780  0.0934     7.9501  0.0180   3e-04
## 
## sigma^2 estimated as 780.6:  log likelihood = -1458.35,  aic = 2932.71
## [1] 2932.707
## [1] 2962.521

Comparison Of Models

We selected two models for prediction. Here, it can be seen their accuracy values. According to box plot, the variance of weighted mean absolute errors for linear model is higher especially in the end. We should choose Arima model because WMAPE value of the model is lower which is a sign for better model.

##          variable  n     mean       sd        CV       FBias      MAPE     RMSE
## 1:  lm_prediction 14 83.35714 17.09074 0.2050303 -0.72352232 0.8010225 109.8228
## 2: selected_arima 14 83.35714 17.09074 0.2050303 -0.03885441 0.3287008  35.2479
##         MAD      MADP     WMAPE
## 1: 63.38325 0.7603817 0.7603817
## 2: 26.33523 0.3159325 0.3159325

For conclusion, here is a plot of actual test set and predicted values of chosen model. As it can be seen, the predictions are pretty accurate.

PRODUCT 2 - Sleepy Baby Wipes

Before making forecasting models for product 2, it should be looked at the plot of data and examined the seasonalities and trend. Below, you can see the plot of sales quantity of Product 2. There isn’t a significant trend as it can be seen. Also, there can’t be seen any significant seasonality. To look further, there is a plot of 3 months of 2021 - March, April and May -. Again, the seasonality isn’t significant, though it can be said there is a spike in the plot at the beginning of the month. In May, there is a big rising probably due to Covid-19 conditions. In conclusion, it can be said that there is monthly seasonality but it isn’t very clear.

Linear Regression Model For Product 2

First type of model that is going to used is linear regression model. First of all, it would be wise to select attributes that will help to model from correlation matrix. Below, you can see the correlations between the attributes. According to this matrix, category_sold, category_visits, and basket_count can be added to the model.

In the first model, the attributes are added to the model. The adjusted R-squared value indicates whether model is good or not. The value for the first model is pretty high which is a good sign. But there are outliers which is probably due to campaigns and holidays. The outliers can be eliminated for a better model. Lastly, ‘lag1’ attribute can be added because it is very high in the ACF. In the final linear regression model, adjusted R-squared value is high enough and plots are good enough to make predictions.

## 
## Call:
## lm(formula = sold_count ~ category_sold + category_visits + basket_count, 
##     data = sold)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -422.40  -60.15    1.95   63.20 1208.91 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -60.17090   11.42700  -5.266 2.39e-07 ***
## category_sold     0.14185    0.02200   6.449 3.58e-10 ***
## category_visits   0.00693    0.01256   0.552    0.581    
## basket_count      0.18780    0.01162  16.161  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 128.9 on 365 degrees of freedom
## Multiple R-squared:  0.9068, Adjusted R-squared:  0.906 
## F-statistic:  1183 on 3 and 365 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 125.12, df = 10, p-value < 2.2e-16

##    sold_count    
##  Min.   :  30.0  
##  1st Qu.: 165.0  
##  Median : 238.0  
##  Mean   : 381.4  
##  3rd Qu.: 431.0  
##  Max.   :4191.0
## 
## Call:
## lm(formula = sold_count ~ big_outlier + category_sold + category_visits + 
##     basket_count, data = sold)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -356.35  -52.28   10.07   53.54 1315.86 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -2.148e+01  1.241e+01  -1.730   0.0845 .  
## big_outlier      2.303e+02  3.592e+01   6.410 4.51e-10 ***
## category_sold    1.425e-01  2.088e-02   6.824 3.71e-11 ***
## category_visits -4.873e-04  1.198e-02  -0.041   0.9676    
## basket_count     1.477e-01  1.268e-02  11.655  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 122.4 on 364 degrees of freedom
## Multiple R-squared:  0.9162, Adjusted R-squared:  0.9153 
## F-statistic: 995.2 on 4 and 364 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 95.607, df = 10, p-value = 4.11e-16
## 
## Call:
## lm(formula = sold_count ~ lag1 + big_outlier + category_sold + 
##     category_visits + basket_count, data = sold)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -381.58  -37.12    4.89   39.84 1334.45 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -40.28635   11.39508  -3.535  0.00046 ***
## lag1              0.44599    0.04880   9.140  < 2e-16 ***
## big_outlier     178.62606   32.91952   5.426 1.05e-07 ***
## category_sold     0.13014    0.01890   6.886 2.54e-11 ***
## category_visits   0.01271    0.01091   1.165  0.24494    
## basket_count      0.15168    0.01145  13.244  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 110.5 on 363 degrees of freedom
## Multiple R-squared:  0.9319, Adjusted R-squared:  0.931 
## F-statistic: 993.4 on 5 and 363 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 74.502, df = 10, p-value = 5.947e-12

Arima Model For Product 2

Second type of model that is going to build is ARIMA model. For this model, in the beginning, the data should be decomposed. Firstly, a frequency value should be chosen. Since there is no significant seasonality, the highest value in the ACF will be chosen which is 34. Additive type of decomposition will be used for this task. Below, the random series can be seen.

After the decomposition, (p,d,q) values should be chosen for the model. For this task, ACF and PACF will be examined. Looking at the ACF, for ‘q’ value 1 or 11 can be chosen and looking at the PACF, for ‘p’ value 1 can be chosen. Also, auto.arima function is used as well. The AIC and BIC values of models that are suggested can be seen below. Looking at AIC and BIC values, (1,0,11) model is best among them. After the model is selected, the regressors that most correlates with the sold count are added to model to make it better. In the final model, the AIC and BIC values are lower. We can proceed with this model.

## 
## Call:
## arima(x = detrend, order = c(1, 0, 1))
## 
## Coefficients:
##          ar1     ma1  intercept
##       0.5985  0.1204    -2.2120
## s.e.  0.0598  0.0686    45.0812
## 
## sigma^2 estimated as 88277:  log likelihood = -2383.17,  aic = 4774.34
## [1] 4774.343
## [1] 4789.6
## 
## Call:
## arima(x = detrend, order = c(1, 0, 11))
## 
## Coefficients:
##          ar1     ma1     ma2      ma3      ma4      ma5      ma6      ma7
##       0.5115  0.0898  0.0048  -0.1392  -0.1806  -0.2103  -0.1589  -0.1076
## s.e.  0.2066  0.2088  0.1286   0.0770   0.0556   0.0745   0.0945   0.0925
##           ma8      ma9     ma10     ma11  intercept
##       -0.0942  -0.0735  -0.0572  -0.0731     0.3060
## s.e.   0.0784   0.0771   0.0727   0.0640     2.0291
## 
## sigma^2 estimated as 76841:  log likelihood = -2361.76,  aic = 4751.51
## [1] 4751.515
## [1] 4804.913
## Series: detrend 
## ARIMA(3,0,0) with zero mean 
## 
## Coefficients:
##          ar1      ar2      ar3
##       0.7228  -0.0081  -0.1412
## s.e.  0.0540   0.0669   0.0539
## 
## sigma^2 estimated as 86941:  log likelihood=-2379.15
## AIC=4766.29   AICc=4766.41   BIC=4781.55
## [1] 4766.292
## [1] 4781.549
## 
## Call:
## arima(x = detrend, order = c(1, 0, 11), xreg = xreg)
## 
## Coefficients:
##          ar1     ma1    ma2     ma3     ma4    ma5     ma6     ma7     ma8
##       0.5558  0.1483  0.178  0.1079  0.0327  8e-04  0.0653  0.0634  0.0101
## s.e.     NaN     NaN    NaN     NaN     NaN    NaN     NaN     NaN     NaN
##          ma9    ma10    ma11  intercept   xreg1   xreg2   xreg3
##       0.0076  0.0436  0.0388  -450.0970  0.1404  0.0732  0.0487
## s.e.     NaN  0.0533  0.0598    33.0371  0.0164  0.0184  0.0316
## 
## sigma^2 estimated as 19786:  log likelihood = -2132.8,  aic = 4299.6
## [1] 4299.597
## [1] 4364.438

Comparison Of Models

We selected two models for prediction. Here, it can be seen their accuracy values. According to box plot, the weighted mean absolute errors for Arima model is higher. We should choose Linear model because WMAPE value of the model is lower which is a sign for better model.

##          variable  n     mean      sd        CV      FBias      MAPE     RMSE
## 1:  lm_prediction 14 542.4286 335.978 0.6193958 -0.1358889 0.2050354 263.4115
## 2: selected_arima 14 542.4286 335.978 0.6193958  0.8441860 0.8331456 649.5670
##         MAD      MADP     WMAPE
## 1: 115.9278 0.2137200 0.2137200
## 2: 512.1721 0.9442203 0.9442203

For conclusion, here is a plot of actual test set and predicted values of chosen model. As it can be seen, the predictions are pretty accurate.

PRODUCT 3 - Xiaomi Bluetooth Headphones

At below,looking at the plots of the product; in line graph it can be observed that the sales have variance, in some dates the plot has peaks and also there might be a cyclical behaviour which is an indicator for seasonality. For further investigation, ‘3 Months Sales of 2021’ plot can be examined, there is not clear repeating pattern that can be easily observed.

Looking at the boxplots; in the weekly boxplot the sales is weekdays seem to be similar, daily and weekly seasonaity can be investigated. In monthly boxplot, there is change with respect to months, there is no clear repeating monthly behaviour. In histograms, one can observe that the sales’ distribution is close to normal distribution.

Trying Different ARIMA Models for Product 3 - 6676673

Firstly, different ARIMA models can be built in order to test different models on the test set. For this purpose, before building an ARIMA model, the data should be decomposed,a frequency value should be chosen. 30 and 7 day frequency can be selected and the data can be decomposed accordingly. Along with 30 and 7 day frequency, ACF plot of the data can be examined and in the lag that we see high autocorrelation it can be chosen as another trial frequency to decompose. Since variance don’t seem to be increasing, additive type of decomposition can be used for decomposition. Below, the random series can be seen.

Decomposition with 7 Day Freq

Decomposition with 7 Day Freq

The above decomposition series belong to time series with 7 and 30 days frequency, respectively.

Looking at the ACF plot of the series, highest ACF value belongs to lag 32, so time series decomposition with 32 day frequency would be sufficient.

In time series decomposition, it is assumed that the random part is randomly distributed with mean zero and standard deviation 1; in order to decide on the best frequency, the random part of the decomposed series should be observed. In this case, the random part of the decomposed time series with 7 day frequency seem to be closer to randomly distributed series with mean zero and std dev 1, so it is chosen as the final decomposition.

After the decomposition, (p,d,q) values should be chosen for the model. For this task, ACF and PACF will be examined.For q, peaks at ACF function can be chosen and for p values, peaks at PACF function can be chosen. Looking at the ACF, for ‘q’ value 3 or 4 may be selected and looking at the PACF, for ‘p’ value 3 or 9 may be selected. Also, auto.arima function is used as well. The AIC and BIC values of models that are suggested can be seen below. Smaller AIC and BIC values means the model is better. So, looking at AIC and BIC values, (3,0,4) model that auto.arima has suggested is best among them.

## 
## Call:
## arima(x = detrend, order = c(3, 0, 3))
## 
## Coefficients:
##          ar1     ar2      ar3      ma1      ma2      ma3  intercept
##       0.3596  0.1296  -0.3567  -0.5363  -0.4437  -0.0199    -0.0171
## s.e.  0.1564  0.2232   0.1428   0.1644   0.2472   0.2030     0.0711
## 
## sigma^2 estimated as 9101:  log likelihood = -2381.81,  aic = 4779.61
## [1] 4779.61
## [1] 4811.502
## 
## Call:
## arima(x = detrend, order = c(3, 0, 4))
## 
## Coefficients:
##          ar1     ar2      ar3      ma1      ma2     ma3     ma4  intercept
##       0.8120  0.1637  -0.1956  -1.0467  -0.4247  0.0305  0.4411    -0.0200
## s.e.  0.4263  0.7867   0.4415   0.3881   0.9524  0.7568  0.1966     0.0114
## 
## sigma^2 estimated as 8602:  log likelihood = -2373.03,  aic = 4764.07
## [1] 4764.067
## [1] 4799.945
## 
## Call:
## arima(x = detrend, order = c(9, 0, 4))
## 
## Coefficients:
##          ar1     ar2     ar3      ar4     ar5     ar6      ar7     ar8      ar9
##       0.5253  0.3001  0.1109  -0.4532  0.2384  0.0173  -0.0734  0.1143  -0.0900
## s.e.  0.1112  0.1566  0.1444   0.1133  0.0740  0.0734   0.0676  0.0623   0.0563
##           ma1      ma2      ma3     ma4  intercept
##       -0.7536  -0.6178  -0.4313  0.8027    -0.0193
## s.e.   0.1060   0.1732   0.1505  0.0946     0.0126
## 
## sigma^2 estimated as 8307:  log likelihood = -2366.08,  aic = 4762.16
## [1] 4762.156
## [1] 4821.952
## 
##  Fitting models using approximations to speed things up...
## 
##  ARIMA(2,0,2)           with non-zero mean : 4804.437
##  ARIMA(0,0,0)           with non-zero mean : 4936.799
##  ARIMA(1,0,0)           with non-zero mean : 4930.902
##  ARIMA(0,0,1)           with non-zero mean : 4928.255
##  ARIMA(0,0,0)           with zero mean     : 4934.779
##  ARIMA(1,0,2)           with non-zero mean : Inf
##  ARIMA(2,0,1)           with non-zero mean : 4804.68
##  ARIMA(3,0,2)           with non-zero mean : Inf
##  ARIMA(2,0,3)           with non-zero mean : Inf
##  ARIMA(1,0,1)           with non-zero mean : 4930.494
##  ARIMA(1,0,3)           with non-zero mean : Inf
##  ARIMA(3,0,1)           with non-zero mean : Inf
##  ARIMA(3,0,3)           with non-zero mean : Inf
##  ARIMA(2,0,2)           with zero mean     : 4802.813
##  ARIMA(1,0,2)           with zero mean     : 4829.38
##  ARIMA(2,0,1)           with zero mean     : 4803.171
##  ARIMA(3,0,2)           with zero mean     : Inf
##  ARIMA(2,0,3)           with zero mean     : Inf
##  ARIMA(1,0,1)           with zero mean     : 4928.454
##  ARIMA(1,0,3)           with zero mean     : Inf
##  ARIMA(3,0,1)           with zero mean     : Inf
##  ARIMA(3,0,3)           with zero mean     : Inf
## 
##  Now re-fitting the best model(s) without approximations...
## 
##  ARIMA(2,0,2)           with zero mean     : Inf
##  ARIMA(2,0,1)           with zero mean     : Inf
##  ARIMA(2,0,2)           with non-zero mean : Inf
##  ARIMA(2,0,1)           with non-zero mean : Inf
##  ARIMA(1,0,2)           with zero mean     : Inf
##  ARIMA(0,0,1)           with non-zero mean : 4928.265
## 
##  Best model: ARIMA(0,0,1)           with non-zero mean
## Series: detrend 
## ARIMA(0,0,1) with non-zero mean 
## 
## Coefficients:
##          ma1    mean
##       0.1699  -0.140
## s.e.  0.0484   6.876
## 
## sigma^2 estimated as 13828:  log likelihood=-2461.1
## AIC=4928.2   AICc=4928.26   BIC=4940.16
## [1] 4928.204
## [1] 4940.163

Trying Different Linear Regression Models For Product 3

The second type of model that is going to used is linear regression model. Below, you can see the correlations between the attributes. According to this matrix, basket_count, price_count, visit_count and favored_count can be added to the model. since ,above, in the box plots, it has been observed that there is monthly change in the data, so month information can also be added to the candidate models.

Comparison of the Linear Regression and ARIMA Models for Product 3

Different linear regression models and ARIMA models’ performance on the test dates will be calculated and according to their performance, best model can be selected.

##             variable  n     mean       sd        CV       FBias       MAPE
## 1:    lm_prediction2 14 451.5714 90.71063 0.2008777 -0.02509715 0.09312132
## 2:    lm_prediction3 14 451.5714 90.71063 0.2008777 -0.07632216 0.11880289
## 3:    lm_prediction4 14 451.5714 90.71063 0.2008777 -0.08353170 0.11647223
## 4:    lm_prediction5 14 451.5714 90.71063 0.2008777 -0.11399446 0.12828656
## 5:    lm_prediction6 14 451.5714 90.71063 0.2008777 -0.03476233 0.07662185
## 6:    lm_prediction7 14 451.5714 90.71063 0.2008777 -0.10582440 0.12395939
## 7:  arima_prediction 14 451.5714 90.71063 0.2008777  0.05141121 0.12779687
## 8: sarima_prediction 14 451.5714 90.71063 0.2008777  0.05256333 0.12798436
## 9:    selected_arima 14 451.5714 90.71063 0.2008777  0.09418716 0.17941751
##         RMSE      MAD       MADP      WMAPE
## 1:  49.31985 40.23665 0.08910363 0.08910363
## 2:  58.61266 50.09150 0.11092707 0.11092707
## 3:  59.53828 49.53706 0.10969928 0.10969928
## 4:  64.90818 55.99223 0.12399418 0.12399418
## 5:  42.32081 32.44684 0.07185318 0.07185318
## 6:  60.99548 52.93493 0.11722384 0.11722384
## 7:  77.45611 61.04713 0.13518821 0.13518821
## 8:  77.46723 61.18399 0.13549128 0.13549128
## 9: 100.82860 81.07444 0.17953847 0.17953847

Smallest Weighted Mean Absolute Percentage Error is obtained for the linear regression model ‘sold_count~basket_count + visit_count + as.factor(mon)+ as.factor(is_campaign)’, so further on this model is selected for our prediction purposes.

For conclusion, here is a plot of actual test set and predicted values of chosen model. As it can be seen, the predictions are pretty accurate.

## One Day Ahead Prediction with the Selected Model for Product 3

With the selected model, 1 day ahead prediction can be performed using all the data on hand, since in this competition one day ahead prediction should be submitted.

##     price event_date product_content_id sold_count visit_count favored_count
## 1: 114.15 2021-07-02            6676673        307       11850           672
##    basket_count category_sold category_brand_sold category_visits ty_visits
## 1:         1001          4255                 778          224985  99819109
##    category_basket category_favored w_day mon is_campaign
## 1:           18828            17424     6   7           0
##     price event_date product_content_id sold_count visit_count favored_count
## 1: 114.15 2021-07-04            6676673        307       11850           672
##    basket_count category_sold category_brand_sold category_visits ty_visits
## 1:         1001          4255                 778          224985  99819109
##    category_basket category_favored w_day mon is_campaign lm_prediction
## 1:           18828            17424     6   7           0      380.0093

PRODUCT 4 - Fakir Vacuum Cleaner

At below,looking at the plots of the product; in line graph it can be observed that the sales have variance, in some dates the plot has high outliers and also there might be a cyclical behaviour which is an indicator for seasonality. For further investigation, ‘3 Months Sales of 2021’ plot can be examined, there is not clear repeating pattern that can be easily observed.

Looking at the boxplots; in the weekly boxplot the sales is weekdays seem to be similar, daily and weekly seasonaity can be investigated. In monthly boxplot, there is change with respect to months, there is no clear repeating monthly behaviour. In histograms, one can observe that the sales’ distribution is close to normal distribution.

Trying Different ARIMA Models for Product 4 - 7061886

Firstly, different ARIMA models can be built in order to test different models on the test set. 30 and 7 day frequency can be selected and the data can be decomposed accordingly. Since variance don’t seem to be increasing, additive type of decomposition can be used for decomposition. Below, the random series can be seen.

The above decomposition series belong to time series with 7 and 30 days frequency, respectively. Looking at the ACF plot of the series, highest ACF value belongs to lag 16, so time series decomposition with 16 day frequency would be sufficient.

In this case, the random part of the decomposed time series with 16 day frequency seem to be closer to randomly distributed series with mean zero and std dev 1, so it is chosen as the final decomposition.

Looking at the ACF, for ‘q’ value 5 or 7 may be selected and looking at the PACF, for ‘p’ value 1 or 3 may be selected. Also, auto.arima function is used as well. The AIC and BIC values of models that are suggested can be seen below. So, looking at AIC and BIC values, ARIMA(3,0,5) model that is selected with observing the ACF and PACF plots, ARIMA(3,0,5) model’s AIC value is smaller than the ARIMA(1,0,2) model’s AIC value which is suggested by auto arima. For performance comparison with linear models, ARIMA(3,0,5) will be used.

## 
## Call:
## arima(x = detrend, order = c(3, 0, 7))
## 
## Coefficients:
##          ar1     ar2      ar3      ma1      ma2     ma3      ma4     ma5
##       0.7349  0.7381  -0.5911  -0.4786  -0.9036  0.0535  -0.0504  0.1568
## s.e.     NaN     NaN      NaN      NaN      NaN  0.0764   0.0763  0.0787
##          ma6     ma7  intercept
##       0.1090  0.1132    -0.1069
## s.e.  0.0433  0.0595     0.0620
## 
## sigma^2 estimated as 13934:  log likelihood = -2406.41,  aic = 4836.82
## [1] 4836.816
## [1] 4884.348
## 
## Call:
## arima(x = detrend, order = c(3, 0, 5))
## 
## Coefficients:
##          ar1     ar2      ar3      ma1     ma2     ma3     ma4     ma5
##       0.7781  0.8652  -0.7584  -0.5273  -1.063  0.1628  0.1188  0.3088
## s.e.     NaN     NaN      NaN      NaN     NaN  0.0807  0.0586  0.0509
##       intercept
##         -0.0955
## s.e.     0.0798
## 
## sigma^2 estimated as 14197:  log likelihood = -2409.5,  aic = 4839
## [1] 4839.004
## [1] 4878.614
## 
## Call:
## arima(x = detrend, order = c(1, 0, 5))
## 
## Coefficients:
##          ar1      ma1      ma2      ma3      ma4      ma5  intercept
##       0.5853  -0.2723  -0.0939  -0.2944  -0.1989  -0.1404    -0.0759
## s.e.  0.0724   0.0781   0.0544   0.0586   0.0594   0.0597     0.3711
## 
## sigma^2 estimated as 14991:  log likelihood = -2418.13,  aic = 4852.26
## [1] 4852.26
## [1] 4883.949
## 
##  Fitting models using approximations to speed things up...
## 
##  ARIMA(2,0,2)            with non-zero mean : 4865.152
##  ARIMA(0,0,0)            with non-zero mean : 5017.066
##  ARIMA(1,0,0)            with non-zero mean : 4917.217
##  ARIMA(0,0,1)            with non-zero mean : 4938.712
##  ARIMA(0,0,0)            with zero mean     : 5015.046
##  ARIMA(1,0,2)            with non-zero mean : 4907.553
##  ARIMA(2,0,1)            with non-zero mean : 4920.622
##  ARIMA(3,0,2)            with non-zero mean : Inf
##  ARIMA(2,0,3)            with non-zero mean : 4857.729
##  ARIMA(1,0,3)            with non-zero mean : 4908.329
##  ARIMA(3,0,3)            with non-zero mean : Inf
##  ARIMA(2,0,4)            with non-zero mean : 4859.033
##  ARIMA(1,0,4)            with non-zero mean : Inf
##  ARIMA(3,0,4)            with non-zero mean : Inf
##  ARIMA(2,0,3)            with zero mean     : 4856.042
##  ARIMA(1,0,3)            with zero mean     : 4906.267
##  ARIMA(2,0,2)            with zero mean     : 4863.372
##  ARIMA(3,0,3)            with zero mean     : Inf
##  ARIMA(2,0,4)            with zero mean     : 4857.376
##  ARIMA(1,0,2)            with zero mean     : 4905.5
##  ARIMA(1,0,4)            with zero mean     : Inf
##  ARIMA(3,0,2)            with zero mean     : Inf
##  ARIMA(3,0,4)            with zero mean     : Inf
## 
##  Now re-fitting the best model(s) without approximations...
## 
##  ARIMA(2,0,3)            with zero mean     : Inf
##  ARIMA(2,0,4)            with zero mean     : Inf
##  ARIMA(2,0,3)            with non-zero mean : Inf
##  ARIMA(2,0,4)            with non-zero mean : Inf
##  ARIMA(2,0,2)            with zero mean     : Inf
##  ARIMA(2,0,2)            with non-zero mean : Inf
##  ARIMA(1,0,2)            with zero mean     : 4904.915
## 
##  Best model: ARIMA(1,0,2)            with zero mean
## Series: detrend 
## ARIMA(1,0,2) with zero mean 
## 
## Coefficients:
##          ar1     ma1     ma2
##       0.1387  0.3543  0.2752
## s.e.  0.1436  0.1378  0.0693
## 
## sigma^2 estimated as 17847:  log likelihood=-2448.41
## AIC=4904.81   AICc=4904.91   BIC=4920.65
## [1] 4904.81
## [1] 4920.654

Trying Different Linear Regression Models For Product 4

Below, you can see the correlations between the attributes. According to this matrix, basket_count, category_favored, is_campaign and category_sold can be added to the model, with different combinations. Since ,above, in the box plots, it has been observed that there is monthly change in the data, so month information can also be added to the candidate models.

Comparison of the Linear Regression and ARIMA Models for Product 4

Different linear regression models and ARIMA models’ performance on the test dates will be calculated and according to their performance, best model can be selected.

##             variable  n mean       sd        CV       FBias      MAPE      RMSE
## 1:    lm_prediction1 14   21 7.200427 0.3428775 -0.20693883 0.2697431  5.966694
## 2:    lm_prediction2 14   21 7.200427 0.3428775 -2.97927177 3.4236791 75.719475
## 3:    lm_prediction3 14   21 7.200427 0.3428775 -3.28869474 3.8131788 83.123744
## 4:    lm_prediction4 14   21 7.200427 0.3428775 -3.05884773 3.5175993 76.628455
## 5:    lm_prediction5 14   21 7.200427 0.3428775 -0.35648820 0.3872554 13.486489
## 6:    lm_prediction6 14   21 7.200427 0.3428775 -2.81391925 3.2181353 71.004414
## 7:  arima_prediction 14   21 7.200427 0.3428775 -0.09014912 0.2865406  7.276734
## 8: sarima_prediction 14   21 7.200427 0.3428775  0.02528538 0.2798477  7.197155
## 9:    selected_arima 14   21 7.200427 0.3428775  0.10692728 0.3737146  9.239168
##          MAD      MADP     WMAPE
## 1:  5.193758 0.2473218 0.2473218
## 2: 62.564707 2.9792718 2.9792718
## 3: 69.062590 3.2886947 3.2886947
## 4: 64.235802 3.0588477 3.0588477
## 5:  8.640406 0.4114479 0.4114479
## 6: 59.092304 2.8139193 2.8139193
## 7:  5.722414 0.2724959 0.2724959
## 8:  5.455298 0.2597761 0.2597761
## 9:  7.505308 0.3573956 0.3573956

Smallest Weighted Mean Absolute Percentage Error is obtained for the linear regression model ‘sold_count~basket_count +as.factor(mon)’,but, since it has 2 input attributes, when one of them increases slightly, its effect will be much more impactful, so it has been chosen to continue with the model that has second smallest WMAPE , ARIMA(1,1,4) with decomposed series with 16 day frequency,and it is the model that auto arima suggested. So further on this model is selected for our prediction purposes.

For conclusion, here is a plot of actual test set and predicted values of chosen model. As it can be seen, the predictions are not too far.

One Day Ahead Prediction with the Selected Model for Product 4

With the selected model, 1 day ahead prediction can be performed using all the data on hand, since in this competition one day ahead prediction should be submitted.

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0068 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
## 
## Call:
## arima(x = detrend1, order = c(1, 1, 4), xreg = data_7061886$is_campaign)
## 
## Coefficients:
##           ar1      ma1      ma2      ma3      ma4  data_7061886$is_campaign
##       -0.0682  -0.3683  -0.1179  -0.1860  -0.3279                   21.6048
## s.e.   0.1652   0.1548   0.1040   0.0564   0.0647                    5.4068
## 
## sigma^2 estimated as 452.3:  log likelihood = -1734.73,  aic = 3483.45
## [1] 3483.451
## [1] 3511.16
##     price event_date product_content_id sold_count visit_count favored_count
## 1: 297.08 2021-07-04            7061886         18        1249           131
##    basket_count category_sold category_brand_sold category_visits ty_visits
## 1:           70           737                 163           53346  99819109
##    category_basket category_favored w_day mon is_campaign arima1_prediction
## 1:            2800             4702     6   7           0          4.469781

PRODUCT 5 - TrendyolMilla Tights

At below,looking at the plots of the product; in line graph it can be observed that the sales have increasing variance, in some dates the plot has high outliers and also there might be a cyclical behaviour which is an indicator for seasonality. For further investigation, ‘3 Months Sales of 2021’ plot can be examined, there is not clear repeating pattern that can be easily observed.

Looking at the boxplots; in the weekly boxplot the sales is weekdays seem to be similar, daily and weekly seasonaity can be investigated. In monthly boxplot, there is change with respect to months, however median of the months seem to be close to each other, this may be an indicator for monthly seasonality. In histograms, one can observe that the sales’ distribution is close to normal distribution.

Trying Different ARIMA Models for Product 5 - 31515569

Firstly, different ARIMA models can be built in order to test different models on the test set. 30 and 7 day frequency can be selected and the data can be decomposed accordingly. Since variance seem to be increasing, multiplicative type of decomposition can be used for decomposition. Below, the random series can be seen.

The above decomposition series belong to time series with 7 and 30 days frequency, respectively. Looking at the ACF plot of the series, highest ACF value belongs to lag 16, so time series decomposition with 16 day frequency would be sufficient.

In this case, the random part of the decomposed time series with 16 day frequency seem to be closer to randomly distributed series with mean zero and std dev 1, so it is chosen as the final decomposition.

Looking at the ACF, for ‘q’ value 2,5 or 8 may be selected and looking at the PACF, for ‘p’ value 3 or 4 may be selected. Also, auto.arima function is used as well. The AIC and BIC values of models that are suggested can be seen below. So, looking at AIC and BIC values, ARIMA(3,0,5) model that is selected with observing the ACF and PACF plots, ARIMA(3,0,5) model’s AIC value is smaller than the ARIMA(1,0,3) model’s AIC value which is suggested by auto arima. For performance comparison with linear models, ARIMA(3,0,5) will be used. ARIMA(3,0,5) best.

Trying Different Linear Regression Models For Product 5

Below, you can see the correlations between the attributes. According to this matrix, basket_count, favored_count, is_campaign and category_sold can be added to the model, with different combinations. Since ,above, in the box plots, it has been observed that there is monthly change in the data, so month information can also be added to the candidate models.

Comparison of the Linear Regression and ARIMA Models for Product 5

Different linear regression models and ARIMA models’ performance on the test dates will be calculated and according to their performance, best model can be selected.

## [1] "input_series=data$sold_count"
## 
##  ARIMA(0,0,0) with zero mean     : 6394.698
##  ARIMA(0,0,0) with non-zero mean : 6220.27
##  ARIMA(0,0,1) with zero mean     : 6126.411
##  ARIMA(0,0,1) with non-zero mean : 6009.282
##  ARIMA(0,0,2) with zero mean     : 6043.806
##  ARIMA(0,0,2) with non-zero mean : 5957.195
##  ARIMA(0,0,3) with zero mean     : 5942.598
##  ARIMA(0,0,3) with non-zero mean : 5884.053
##  ARIMA(0,0,4) with zero mean     : 5921.67
##  ARIMA(0,0,4) with non-zero mean : 5877.716
##  ARIMA(0,0,5) with zero mean     : 5918.286
##  ARIMA(0,0,5) with non-zero mean : 5879.596
##  ARIMA(1,0,0) with zero mean     : 5928.848
##  ARIMA(1,0,0) with non-zero mean : 5911.463
##  ARIMA(1,0,1) with zero mean     : 5929.506
##  ARIMA(1,0,1) with non-zero mean : 5909.434
##  ARIMA(1,0,2) with zero mean     : 5926.647
##  ARIMA(1,0,2) with non-zero mean : 5903.617
##  ARIMA(1,0,3) with zero mean     : 5911.226
##  ARIMA(1,0,3) with non-zero mean : 5877.901
##  ARIMA(1,0,4) with zero mean     : Inf
##  ARIMA(1,0,4) with non-zero mean : 5879.617
##  ARIMA(2,0,0) with zero mean     : 5929.216
##  ARIMA(2,0,0) with non-zero mean : 5907.817
##  ARIMA(2,0,1) with zero mean     : 5930.771
##  ARIMA(2,0,1) with non-zero mean : 5902.491
##  ARIMA(2,0,2) with zero mean     : 5925.483
##  ARIMA(2,0,2) with non-zero mean : 5891.825
##  ARIMA(2,0,3) with zero mean     : 5911.948
##  ARIMA(2,0,3) with non-zero mean : 5879.561
##  ARIMA(3,0,0) with zero mean     : 5928.15
##  ARIMA(3,0,0) with non-zero mean : 5900.006
##  ARIMA(3,0,1) with zero mean     : 5930.061
##  ARIMA(3,0,1) with non-zero mean : 5899.854
##  ARIMA(3,0,2) with zero mean     : 5933.983
##  ARIMA(3,0,2) with non-zero mean : 5887.709
##  ARIMA(4,0,0) with zero mean     : 5929.24
##  ARIMA(4,0,0) with non-zero mean : 5894.763
##  ARIMA(4,0,1) with zero mean     : 5925.197
##  ARIMA(4,0,1) with non-zero mean : 5891.308
##  ARIMA(5,0,0) with zero mean     : 5906.657
##  ARIMA(5,0,0) with non-zero mean : 5884.985
## 
## 
## 
##  Best model: ARIMA(0,0,4) with non-zero mean 
## 
## [1] "input_series=ts(data$sold_count,freq=16)"
## 
##  ARIMA(0,0,0)            with zero mean     : 6394.698
##  ARIMA(0,0,0)            with non-zero mean : 6220.27
##  ARIMA(0,0,0)(0,0,1)[16] with zero mean     : 6294.138
##  ARIMA(0,0,0)(0,0,1)[16] with non-zero mean : 6182.676
##  ARIMA(0,0,0)(0,0,2)[16] with zero mean     : 6267.205
##  ARIMA(0,0,0)(0,0,2)[16] with non-zero mean : 6181.942
##  ARIMA(0,0,0)(1,0,0)[16] with zero mean     : 6247.732
##  ARIMA(0,0,0)(1,0,0)[16] with non-zero mean : 6180.785
##  ARIMA(0,0,0)(1,0,1)[16] with zero mean     : 6236.257
##  ARIMA(0,0,0)(1,0,1)[16] with non-zero mean : 6182.301
##  ARIMA(0,0,0)(1,0,2)[16] with zero mean     : Inf
##  ARIMA(0,0,0)(1,0,2)[16] with non-zero mean : 6183.096
##  ARIMA(0,0,0)(2,0,0)[16] with zero mean     : 6244.526
##  ARIMA(0,0,0)(2,0,0)[16] with non-zero mean : 6182.286
##  ARIMA(0,0,0)(2,0,1)[16] with zero mean     : Inf
##  ARIMA(0,0,0)(2,0,1)[16] with non-zero mean : Inf
##  ARIMA(0,0,0)(2,0,2)[16] with zero mean     : Inf
##  ARIMA(0,0,0)(2,0,2)[16] with non-zero mean : Inf
##  ARIMA(0,0,1)            with zero mean     : 6126.411
##  ARIMA(0,0,1)            with non-zero mean : 6009.282
##  ARIMA(0,0,1)(0,0,1)[16] with zero mean     : 6069.038
##  ARIMA(0,0,1)(0,0,1)[16] with non-zero mean : 5987.659
##  ARIMA(0,0,1)(0,0,2)[16] with zero mean     : 6052.357
##  ARIMA(0,0,1)(0,0,2)[16] with non-zero mean : 5987.348
##  ARIMA(0,0,1)(1,0,0)[16] with zero mean     : 6043.194
##  ARIMA(0,0,1)(1,0,0)[16] with non-zero mean : 5985.413
##  ARIMA(0,0,1)(1,0,1)[16] with zero mean     : 6028.021
##  ARIMA(0,0,1)(1,0,1)[16] with non-zero mean : 5987.431
##  ARIMA(0,0,1)(1,0,2)[16] with zero mean     : Inf
##  ARIMA(0,0,1)(1,0,2)[16] with non-zero mean : 5989.218
##  ARIMA(0,0,1)(2,0,0)[16] with zero mean     : 6037.264
##  ARIMA(0,0,1)(2,0,0)[16] with non-zero mean : 5987.43
##  ARIMA(0,0,1)(2,0,1)[16] with zero mean     : Inf
##  ARIMA(0,0,1)(2,0,1)[16] with non-zero mean : 5989.3
##  ARIMA(0,0,1)(2,0,2)[16] with zero mean     : Inf
##  ARIMA(0,0,1)(2,0,2)[16] with non-zero mean : 5991.24
##  ARIMA(0,0,2)            with zero mean     : 6043.806
##  ARIMA(0,0,2)            with non-zero mean : 5957.195
##  ARIMA(0,0,2)(0,0,1)[16] with zero mean     : 6001.241
##  ARIMA(0,0,2)(0,0,1)[16] with non-zero mean : 5939.246
##  ARIMA(0,0,2)(0,0,2)[16] with zero mean     : 5992.86
##  ARIMA(0,0,2)(0,0,2)[16] with non-zero mean : 5940.305
##  ARIMA(0,0,2)(1,0,0)[16] with zero mean     : 5986.603
##  ARIMA(0,0,2)(1,0,0)[16] with non-zero mean : 5938.173
##  ARIMA(0,0,2)(1,0,1)[16] with zero mean     : 5977.252
##  ARIMA(0,0,2)(1,0,1)[16] with non-zero mean : 5940.24
##  ARIMA(0,0,2)(1,0,2)[16] with zero mean     : Inf
##  ARIMA(0,0,2)(1,0,2)[16] with non-zero mean : 5942.317
##  ARIMA(0,0,2)(2,0,0)[16] with zero mean     : 5983.547
##  ARIMA(0,0,2)(2,0,0)[16] with non-zero mean : 5940.24
##  ARIMA(0,0,2)(2,0,1)[16] with zero mean     : Inf
##  ARIMA(0,0,2)(2,0,1)[16] with non-zero mean : 5942.318
##  ARIMA(0,0,3)            with zero mean     : 5942.598
##  ARIMA(0,0,3)            with non-zero mean : 5884.053
##  ARIMA(0,0,3)(0,0,1)[16] with zero mean     : 5917.294
##  ARIMA(0,0,3)(0,0,1)[16] with non-zero mean : 5873.652
##  ARIMA(0,0,3)(0,0,2)[16] with zero mean     : 5915.625
##  ARIMA(0,0,3)(0,0,2)[16] with non-zero mean : 5875.614
##  ARIMA(0,0,3)(1,0,0)[16] with zero mean     : 5911.389
##  ARIMA(0,0,3)(1,0,0)[16] with non-zero mean : 5873.482
##  ARIMA(0,0,3)(1,0,1)[16] with zero mean     : 5904.144
##  ARIMA(0,0,3)(1,0,1)[16] with non-zero mean : 5875.549
##  ARIMA(0,0,3)(2,0,0)[16] with zero mean     : 5910.439
##  ARIMA(0,0,3)(2,0,0)[16] with non-zero mean : 5875.553
##  ARIMA(0,0,4)            with zero mean     : 5921.67
##  ARIMA(0,0,4)            with non-zero mean : 5877.716
##  ARIMA(0,0,4)(0,0,1)[16] with zero mean     : 5902.592
##  ARIMA(0,0,4)(0,0,1)[16] with non-zero mean : 5868.317
##  ARIMA(0,0,4)(1,0,0)[16] with zero mean     : 5898.323
##  ARIMA(0,0,4)(1,0,0)[16] with non-zero mean : 5867.987
##  ARIMA(0,0,5)            with zero mean     : 5918.286
##  ARIMA(0,0,5)            with non-zero mean : 5879.596
##  ARIMA(1,0,0)            with zero mean     : 5928.848
##  ARIMA(1,0,0)            with non-zero mean : 5911.463
##  ARIMA(1,0,0)(0,0,1)[16] with zero mean     : 5913.662
##  ARIMA(1,0,0)(0,0,1)[16] with non-zero mean : 5898.296
##  ARIMA(1,0,0)(0,0,2)[16] with zero mean     : 5915.174
##  ARIMA(1,0,0)(0,0,2)[16] with non-zero mean : 5900.264
##  ARIMA(1,0,0)(1,0,0)[16] with zero mean     : 5912.762
##  ARIMA(1,0,0)(1,0,0)[16] with non-zero mean : 5898.366
##  ARIMA(1,0,0)(1,0,1)[16] with zero mean     : 5914.729
##  ARIMA(1,0,0)(1,0,1)[16] with non-zero mean : 5900.247
##  ARIMA(1,0,0)(1,0,2)[16] with zero mean     : 5915.576
##  ARIMA(1,0,0)(1,0,2)[16] with non-zero mean : Inf
##  ARIMA(1,0,0)(2,0,0)[16] with zero mean     : 5914.766
##  ARIMA(1,0,0)(2,0,0)[16] with non-zero mean : 5900.282
##  ARIMA(1,0,0)(2,0,1)[16] with zero mean     : Inf
##  ARIMA(1,0,0)(2,0,1)[16] with non-zero mean : Inf
##  ARIMA(1,0,0)(2,0,2)[16] with zero mean     : Inf
##  ARIMA(1,0,0)(2,0,2)[16] with non-zero mean : 5904.293
##  ARIMA(1,0,1)            with zero mean     : 5929.506
##  ARIMA(1,0,1)            with non-zero mean : 5909.434
##  ARIMA(1,0,1)(0,0,1)[16] with zero mean     : 5914.708
##  ARIMA(1,0,1)(0,0,1)[16] with non-zero mean : 5897.18
##  ARIMA(1,0,1)(0,0,2)[16] with zero mean     : 5915.989
##  ARIMA(1,0,1)(0,0,2)[16] with non-zero mean : 5899.016
##  ARIMA(1,0,1)(1,0,0)[16] with zero mean     : 5913.493
##  ARIMA(1,0,1)(1,0,0)[16] with non-zero mean : 5896.933
##  ARIMA(1,0,1)(1,0,1)[16] with zero mean     : 5915.217
##  ARIMA(1,0,1)(1,0,1)[16] with non-zero mean : 5898.97
##  ARIMA(1,0,1)(1,0,2)[16] with zero mean     : 5915.94
##  ARIMA(1,0,1)(1,0,2)[16] with non-zero mean : 5900.99
##  ARIMA(1,0,1)(2,0,0)[16] with zero mean     : 5915.382
##  ARIMA(1,0,1)(2,0,0)[16] with non-zero mean : 5898.974
##  ARIMA(1,0,1)(2,0,1)[16] with zero mean     : Inf
##  ARIMA(1,0,1)(2,0,1)[16] with non-zero mean : 5901.041
##  ARIMA(1,0,2)            with zero mean     : 5926.647
##  ARIMA(1,0,2)            with non-zero mean : 5903.617
##  ARIMA(1,0,2)(0,0,1)[16] with zero mean     : 5912.013
##  ARIMA(1,0,2)(0,0,1)[16] with non-zero mean : 5892.174
##  ARIMA(1,0,2)(0,0,2)[16] with zero mean     : 5913.573
##  ARIMA(1,0,2)(0,0,2)[16] with non-zero mean : 5894.22
##  ARIMA(1,0,2)(1,0,0)[16] with zero mean     : 5910.984
##  ARIMA(1,0,2)(1,0,0)[16] with non-zero mean : 5892.276
##  ARIMA(1,0,2)(1,0,1)[16] with zero mean     : 5912.652
##  ARIMA(1,0,2)(1,0,1)[16] with non-zero mean : 5894.206
##  ARIMA(1,0,2)(2,0,0)[16] with zero mean     : 5912.904
##  ARIMA(1,0,2)(2,0,0)[16] with non-zero mean : 5894.253
##  ARIMA(1,0,3)            with zero mean     : 5911.226
##  ARIMA(1,0,3)            with non-zero mean : 5877.901
##  ARIMA(1,0,3)(0,0,1)[16] with zero mean     : 5896.345
##  ARIMA(1,0,3)(0,0,1)[16] with non-zero mean : 5868.702
##  ARIMA(1,0,3)(1,0,0)[16] with zero mean     : 5893.882
##  ARIMA(1,0,3)(1,0,0)[16] with non-zero mean : 5868.454
##  ARIMA(1,0,4)            with zero mean     : Inf
##  ARIMA(1,0,4)            with non-zero mean : 5879.617
##  ARIMA(2,0,0)            with zero mean     : 5929.216
##  ARIMA(2,0,0)            with non-zero mean : 5907.817
##  ARIMA(2,0,0)(0,0,1)[16] with zero mean     : 5914.481
##  ARIMA(2,0,0)(0,0,1)[16] with non-zero mean : 5895.919
##  ARIMA(2,0,0)(0,0,2)[16] with zero mean     : 5915.699
##  ARIMA(2,0,0)(0,0,2)[16] with non-zero mean : 5897.689
##  ARIMA(2,0,0)(1,0,0)[16] with zero mean     : 5913.188
##  ARIMA(2,0,0)(1,0,0)[16] with non-zero mean : 5895.566
##  ARIMA(2,0,0)(1,0,1)[16] with zero mean     : 5914.807
##  ARIMA(2,0,0)(1,0,1)[16] with non-zero mean : 5897.626
##  ARIMA(2,0,0)(1,0,2)[16] with zero mean     : 5915.466
##  ARIMA(2,0,0)(1,0,2)[16] with non-zero mean : 5899.655
##  ARIMA(2,0,0)(2,0,0)[16] with zero mean     : 5914.183
##  ARIMA(2,0,0)(2,0,0)[16] with non-zero mean : 5897.627
##  ARIMA(2,0,0)(2,0,1)[16] with zero mean     : Inf
##  ARIMA(2,0,0)(2,0,1)[16] with non-zero mean : 5899.699
##  ARIMA(2,0,1)            with zero mean     : 5930.771
##  ARIMA(2,0,1)            with non-zero mean : 5902.491
##  ARIMA(2,0,1)(0,0,1)[16] with zero mean     : 5915.467
##  ARIMA(2,0,1)(0,0,1)[16] with non-zero mean : 5890.756
##  ARIMA(2,0,1)(0,0,2)[16] with zero mean     : 5916.721
##  ARIMA(2,0,1)(0,0,2)[16] with non-zero mean : 5892.439
##  ARIMA(2,0,1)(1,0,0)[16] with zero mean     : 5914.155
##  ARIMA(2,0,1)(1,0,0)[16] with non-zero mean : 5890.223
##  ARIMA(2,0,1)(1,0,1)[16] with zero mean     : 5914.858
##  ARIMA(2,0,1)(1,0,1)[16] with non-zero mean : 5892.293
##  ARIMA(2,0,1)(2,0,0)[16] with zero mean     : 5916.06
##  ARIMA(2,0,1)(2,0,0)[16] with non-zero mean : 5892.295
##  ARIMA(2,0,2)            with zero mean     : 5925.483
##  ARIMA(2,0,2)            with non-zero mean : 5891.825
##  ARIMA(2,0,2)(0,0,1)[16] with zero mean     : 5910.169
##  ARIMA(2,0,2)(0,0,1)[16] with non-zero mean : 5880.613
##  ARIMA(2,0,2)(1,0,0)[16] with zero mean     : 5908.505
##  ARIMA(2,0,2)(1,0,0)[16] with non-zero mean : 5880.4
##  ARIMA(2,0,3)            with zero mean     : 5911.948
##  ARIMA(2,0,3)            with non-zero mean : 5879.561
##  ARIMA(3,0,0)            with zero mean     : 5928.15
##  ARIMA(3,0,0)            with non-zero mean : 5900.006
##  ARIMA(3,0,0)(0,0,1)[16] with zero mean     : 5913.108
##  ARIMA(3,0,0)(0,0,1)[16] with non-zero mean : 5888.623
##  ARIMA(3,0,0)(0,0,2)[16] with zero mean     : 5914.335
##  ARIMA(3,0,0)(0,0,2)[16] with non-zero mean : 5890.524
##  ARIMA(3,0,0)(1,0,0)[16] with zero mean     : 5911.652
##  ARIMA(3,0,0)(1,0,0)[16] with non-zero mean : 5888.37
##  ARIMA(3,0,0)(1,0,1)[16] with zero mean     : 5912.781
##  ARIMA(3,0,0)(1,0,1)[16] with non-zero mean : 5890.44
##  ARIMA(3,0,0)(2,0,0)[16] with zero mean     : 5913.366
##  ARIMA(3,0,0)(2,0,0)[16] with non-zero mean : 5890.443
##  ARIMA(3,0,1)            with zero mean     : 5930.061
##  ARIMA(3,0,1)            with non-zero mean : 5899.854
##  ARIMA(3,0,1)(0,0,1)[16] with zero mean     : 5914.874
##  ARIMA(3,0,1)(0,0,1)[16] with non-zero mean : 5888.16
##  ARIMA(3,0,1)(1,0,0)[16] with zero mean     : 5913.344
##  ARIMA(3,0,1)(1,0,0)[16] with non-zero mean : 5887.858
##  ARIMA(3,0,2)            with zero mean     : 5933.983
##  ARIMA(3,0,2)            with non-zero mean : 5887.709
##  ARIMA(4,0,0)            with zero mean     : 5929.24
##  ARIMA(4,0,0)            with non-zero mean : 5894.763
##  ARIMA(4,0,0)(0,0,1)[16] with zero mean     : 5913.391
##  ARIMA(4,0,0)(0,0,1)[16] with non-zero mean : 5882.648
##  ARIMA(4,0,0)(1,0,0)[16] with zero mean     : 5911.555
##  ARIMA(4,0,0)(1,0,0)[16] with non-zero mean : 5882.25
##  ARIMA(4,0,1)            with zero mean     : 5925.197
##  ARIMA(4,0,1)            with non-zero mean : 5891.308
##  ARIMA(5,0,0)            with zero mean     : 5906.657
##  ARIMA(5,0,0)            with non-zero mean : 5884.985
## 
## 
## 
##  Best model: ARIMA(0,0,4)(1,0,0)[16] with non-zero mean 
## 
## [1] "input_series=data$sold_count"
## 
##  ARIMA(0,0,0) with zero mean     : 6411.073
##  ARIMA(0,0,0) with non-zero mean : 6236.361
##  ARIMA(0,0,1) with zero mean     : 6142.057
##  ARIMA(0,0,1) with non-zero mean : 6024.666
##  ARIMA(0,0,2) with zero mean     : 6059.217
##  ARIMA(0,0,2) with non-zero mean : 5972.392
##  ARIMA(0,0,3) with zero mean     : 5957.704
##  ARIMA(0,0,3) with non-zero mean : 5899.022
##  ARIMA(0,0,4) with zero mean     : 5936.713
##  ARIMA(0,0,4) with non-zero mean : 5892.644
##  ARIMA(0,0,5) with zero mean     : 5933.312
##  ARIMA(0,0,5) with non-zero mean : 5894.52
##  ARIMA(1,0,0) with zero mean     : 5943.923
##  ARIMA(1,0,0) with non-zero mean : 5926.474
##  ARIMA(1,0,1) with zero mean     : 5944.579
##  ARIMA(1,0,1) with non-zero mean : 5924.436
##  ARIMA(1,0,2) with zero mean     : 5941.704
##  ARIMA(1,0,2) with non-zero mean : 5918.603
##  ARIMA(1,0,3) with zero mean     : 5926.236
##  ARIMA(1,0,3) with non-zero mean : 5892.825
##  ARIMA(1,0,4) with zero mean     : Inf
##  ARIMA(1,0,4) with non-zero mean : 5894.542
##  ARIMA(2,0,0) with zero mean     : 5944.289
##  ARIMA(2,0,0) with non-zero mean : 5922.817
##  ARIMA(2,0,1) with zero mean     : 5945.842
##  ARIMA(2,0,1) with non-zero mean : 5917.489
##  ARIMA(2,0,2) with zero mean     : 5940.532
##  ARIMA(2,0,2) with non-zero mean : 5906.798
##  ARIMA(2,0,3) with zero mean     : 5926.952
##  ARIMA(2,0,3) with non-zero mean : 5894.486
##  ARIMA(3,0,0) with zero mean     : 5943.214
##  ARIMA(3,0,0) with non-zero mean : 5914.994
##  ARIMA(3,0,1) with zero mean     : 5945.125
##  ARIMA(3,0,1) with non-zero mean : 5914.844
##  ARIMA(3,0,2) with zero mean     : 5949.048
##  ARIMA(3,0,2) with non-zero mean : 5902.651
##  ARIMA(4,0,0) with zero mean     : 5944.301
##  ARIMA(4,0,0) with non-zero mean : 5909.75
##  ARIMA(4,0,1) with zero mean     : 5940.242
##  ARIMA(4,0,1) with non-zero mean : 5906.275
##  ARIMA(5,0,0) with zero mean     : 5921.646
##  ARIMA(5,0,0) with non-zero mean : 5899.919
## 
## 
## 
##  Best model: ARIMA(0,0,4) with non-zero mean 
## 
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
## 
##  ARIMA(0,0,0) with zero mean     : 6427.441
##  ARIMA(0,0,0) with non-zero mean : 6252.459
##  ARIMA(0,0,1) with zero mean     : 6157.663
##  ARIMA(0,0,1) with non-zero mean : 6040.13
##  ARIMA(0,0,2) with zero mean     : 6074.59
##  ARIMA(0,0,2) with non-zero mean : 5987.647
##  ARIMA(0,0,3) with zero mean     : 5972.797
##  ARIMA(0,0,3) with non-zero mean : 5914.014
##  ARIMA(0,0,4) with zero mean     : 5951.733
##  ARIMA(0,0,4) with non-zero mean : 5907.603
##  ARIMA(0,0,5) with zero mean     : 5948.315
##  ARIMA(0,0,5) with non-zero mean : 5909.476
##  ARIMA(1,0,0) with zero mean     : 5958.974
##  ARIMA(1,0,0) with non-zero mean : 5941.511
##  ARIMA(1,0,1) with zero mean     : 5959.626
##  ARIMA(1,0,1) with non-zero mean : 5939.472
##  ARIMA(1,0,2) with zero mean     : 5956.74
##  ARIMA(1,0,2) with non-zero mean : 5933.614
##  ARIMA(1,0,3) with zero mean     : 5941.224
##  ARIMA(1,0,3) with non-zero mean : 5907.777
##  ARIMA(1,0,4) with zero mean     : Inf
##  ARIMA(1,0,4) with non-zero mean : 5909.497
##  ARIMA(2,0,0) with zero mean     : 5959.336
##  ARIMA(2,0,0) with non-zero mean : 5937.855
##  ARIMA(2,0,1) with zero mean     : 5960.886
##  ARIMA(2,0,1) with non-zero mean : 5932.536
##  ARIMA(2,0,2) with zero mean     : 5955.56
##  ARIMA(2,0,2) with non-zero mean : 5921.801
##  ARIMA(2,0,3) with zero mean     : 5941.936
##  ARIMA(2,0,3) with non-zero mean : 5909.443
##  ARIMA(3,0,0) with zero mean     : 5958.254
##  ARIMA(3,0,0) with non-zero mean : 5930.021
##  ARIMA(3,0,1) with zero mean     : 5960.163
##  ARIMA(3,0,1) with non-zero mean : 5929.876
##  ARIMA(3,0,2) with zero mean     : 5964.043
##  ARIMA(3,0,2) with non-zero mean : 5917.618
##  ARIMA(4,0,0) with zero mean     : 5959.338
##  ARIMA(4,0,0) with non-zero mean : 5924.785
##  ARIMA(4,0,1) with zero mean     : 5955.262
##  ARIMA(4,0,1) with non-zero mean : 5921.289
##  ARIMA(5,0,0) with zero mean     : 5936.613
##  ARIMA(5,0,0) with non-zero mean : 5914.888
## 
## 
## 
##  Best model: ARIMA(0,0,4) with non-zero mean 
## 
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
## 
##  ARIMA(0,0,0) with zero mean     : 6443.865
##  ARIMA(0,0,0) with non-zero mean : 6268.435
##  ARIMA(0,0,1) with zero mean     : 6173.382
##  ARIMA(0,0,1) with non-zero mean : 6055.43
##  ARIMA(0,0,2) with zero mean     : 6090.049
##  ARIMA(0,0,2) with non-zero mean : 6002.788
##  ARIMA(0,0,3) with zero mean     : 5987.993
##  ARIMA(0,0,3) with non-zero mean : 5928.924
##  ARIMA(0,0,4) with zero mean     : 5966.857
##  ARIMA(0,0,4) with non-zero mean : 5922.489
##  ARIMA(0,0,5) with zero mean     : 5963.413
##  ARIMA(0,0,5) with non-zero mean : 5924.36
##  ARIMA(1,0,0) with zero mean     : 5974.091
##  ARIMA(1,0,0) with non-zero mean : 5956.506
##  ARIMA(1,0,1) with zero mean     : 5974.742
##  ARIMA(1,0,1) with non-zero mean : 5954.456
##  ARIMA(1,0,2) with zero mean     : 5971.838
##  ARIMA(1,0,2) with non-zero mean : 5948.576
##  ARIMA(1,0,3) with zero mean     : 5956.299
##  ARIMA(1,0,3) with non-zero mean : 5922.662
##  ARIMA(1,0,4) with zero mean     : Inf
##  ARIMA(1,0,4) with non-zero mean : 5924.381
##  ARIMA(2,0,0) with zero mean     : 5974.451
##  ARIMA(2,0,0) with non-zero mean : 5952.834
##  ARIMA(2,0,1) with zero mean     : 5976.002
##  ARIMA(2,0,1) with non-zero mean : 5947.503
##  ARIMA(2,0,2) with zero mean     : 5970.652
##  ARIMA(2,0,2) with non-zero mean : 5936.729
##  ARIMA(2,0,3) with zero mean     : 5957.005
##  ARIMA(2,0,3) with non-zero mean : 5924.327
##  ARIMA(3,0,0) with zero mean     : 5973.359
##  ARIMA(3,0,0) with non-zero mean : 5944.976
##  ARIMA(3,0,1) with zero mean     : 5975.269
##  ARIMA(3,0,1) with non-zero mean : 5944.828
##  ARIMA(3,0,2) with zero mean     : 5979.166
##  ARIMA(3,0,2) with non-zero mean : 5932.525
##  ARIMA(4,0,0) with zero mean     : 5974.444
##  ARIMA(4,0,0) with non-zero mean : 5939.724
##  ARIMA(4,0,1) with zero mean     : 5970.356
##  ARIMA(4,0,1) with non-zero mean : 5936.211
##  ARIMA(5,0,0) with zero mean     : 5951.655
##  ARIMA(5,0,0) with non-zero mean : 5929.788
## 
## 
## 
##  Best model: ARIMA(0,0,4) with non-zero mean 
## 
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
## 
##  ARIMA(0,0,0) with zero mean     : 6460.525
##  ARIMA(0,0,0) with non-zero mean : 6284.275
##  ARIMA(0,0,1) with zero mean     : 6189.3
##  ARIMA(0,0,1) with non-zero mean : 6070.695
##  ARIMA(0,0,2) with zero mean     : 6105.797
##  ARIMA(0,0,2) with non-zero mean : 6017.938
##  ARIMA(0,0,3) with zero mean     : 6003.444
##  ARIMA(0,0,3) with non-zero mean : 5943.901
##  ARIMA(0,0,4) with zero mean     : 5982.251
##  ARIMA(0,0,4) with non-zero mean : 5937.465
##  ARIMA(0,0,5) with zero mean     : 5978.782
##  ARIMA(0,0,5) with non-zero mean : 5939.338
##  ARIMA(1,0,0) with zero mean     : 5989.48
##  ARIMA(1,0,0) with non-zero mean : 5971.634
##  ARIMA(1,0,1) with zero mean     : 5990.12
##  ARIMA(1,0,1) with non-zero mean : 5969.553
##  ARIMA(1,0,2) with zero mean     : 5987.224
##  ARIMA(1,0,2) with non-zero mean : 5963.659
##  ARIMA(1,0,3) with zero mean     : 5971.637
##  ARIMA(1,0,3) with non-zero mean : 5937.644
##  ARIMA(1,0,4) with zero mean     : Inf
##  ARIMA(1,0,4) with non-zero mean : 5939.375
##  ARIMA(2,0,0) with zero mean     : 5989.827
##  ARIMA(2,0,0) with non-zero mean : 5967.918
##  ARIMA(2,0,1) with zero mean     : 5991.371
##  ARIMA(2,0,1) with non-zero mean : 5962.536
##  ARIMA(2,0,2) with zero mean     : 5986.037
##  ARIMA(2,0,2) with non-zero mean : 5951.741
##  ARIMA(2,0,3) with zero mean     : 5972.331
##  ARIMA(2,0,3) with non-zero mean : 5939.304
##  ARIMA(3,0,0) with zero mean     : 5988.74
##  ARIMA(3,0,0) with non-zero mean : 5960.018
##  ARIMA(3,0,1) with zero mean     : 5990.65
##  ARIMA(3,0,1) with non-zero mean : 5959.852
##  ARIMA(3,0,2) with zero mean     : 5994.578
##  ARIMA(3,0,2) with non-zero mean : 5947.546
##  ARIMA(4,0,0) with zero mean     : 5989.823
##  ARIMA(4,0,0) with non-zero mean : 5954.721
##  ARIMA(4,0,1) with zero mean     : 5985.709
##  ARIMA(4,0,1) with non-zero mean : 5951.193
##  ARIMA(5,0,0) with zero mean     : 5966.948
##  ARIMA(5,0,0) with non-zero mean : 5944.777
## 
## 
## 
##  Best model: ARIMA(0,0,4) with non-zero mean 
## 
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
## 
##  ARIMA(0,0,0) with zero mean     : 6477.126
##  ARIMA(0,0,0) with non-zero mean : 6300.121
##  ARIMA(0,0,1) with zero mean     : 6205.004
##  ARIMA(0,0,1) with non-zero mean : 6085.987
##  ARIMA(0,0,2) with zero mean     : 6121.194
##  ARIMA(0,0,2) with non-zero mean : 6033.099
##  ARIMA(0,0,3) with zero mean     : 6018.542
##  ARIMA(0,0,3) with non-zero mean : 5958.844
##  ARIMA(0,0,4) with zero mean     : 5997.263
##  ARIMA(0,0,4) with non-zero mean : 5952.394
##  ARIMA(0,0,5) with zero mean     : 5993.777
##  ARIMA(0,0,5) with non-zero mean : 5954.265
##  ARIMA(1,0,0) with zero mean     : 6004.527
##  ARIMA(1,0,0) with non-zero mean : 5986.636
##  ARIMA(1,0,1) with zero mean     : 6005.161
##  ARIMA(1,0,1) with non-zero mean : 5984.556
##  ARIMA(1,0,2) with zero mean     : 6002.251
##  ARIMA(1,0,2) with non-zero mean : 5978.644
##  ARIMA(1,0,3) with zero mean     : 5986.615
##  ARIMA(1,0,3) with non-zero mean : 5952.568
##  ARIMA(1,0,4) with zero mean     : Inf
##  ARIMA(1,0,4) with non-zero mean : 5954.287
##  ARIMA(2,0,0) with zero mean     : 6004.868
##  ARIMA(2,0,0) with non-zero mean : 5982.924
##  ARIMA(2,0,1) with zero mean     : 6006.412
##  ARIMA(2,0,1) with non-zero mean : 5977.545
##  ARIMA(2,0,2) with zero mean     : 6001.055
##  ARIMA(2,0,2) with non-zero mean : 5966.715
##  ARIMA(2,0,3) with zero mean     : 5987.305
##  ARIMA(2,0,3) with non-zero mean : 5954.232
##  ARIMA(3,0,0) with zero mean     : 6003.771
##  ARIMA(3,0,0) with non-zero mean : 5975.018
##  ARIMA(3,0,1) with zero mean     : 6005.681
##  ARIMA(3,0,1) with non-zero mean : 5974.852
##  ARIMA(3,0,2) with zero mean     : 6009.584
##  ARIMA(3,0,2) with non-zero mean : 5962.49
##  ARIMA(4,0,0) with zero mean     : 6004.851
##  ARIMA(4,0,0) with non-zero mean : 5969.711
##  ARIMA(4,0,1) with zero mean     : 6000.722
##  ARIMA(4,0,1) with non-zero mean : 5966.162
##  ARIMA(5,0,0) with zero mean     : 5981.909
##  ARIMA(5,0,0) with non-zero mean : 5959.712
## 
## 
## 
##  Best model: ARIMA(0,0,4) with non-zero mean 
## 
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
## 
##  ARIMA(0,0,0) with zero mean     : 6493.708
##  ARIMA(0,0,0) with non-zero mean : 6315.97
##  ARIMA(0,0,1) with zero mean     : 6220.824
##  ARIMA(0,0,1) with non-zero mean : 6101.242
##  ARIMA(0,0,2) with zero mean     : 6136.698
##  ARIMA(0,0,2) with non-zero mean : 6048.213
##  ARIMA(0,0,3) with zero mean     : 6033.648
##  ARIMA(0,0,3) with non-zero mean : 5973.783
##  ARIMA(0,0,4) with zero mean     : 6012.288
##  ARIMA(0,0,4) with non-zero mean : 5967.309
##  ARIMA(0,0,5) with zero mean     : 6008.774
##  ARIMA(0,0,5) with non-zero mean : 5969.181
##  ARIMA(1,0,0) with zero mean     : 6019.581
##  ARIMA(1,0,0) with non-zero mean : 6001.628
##  ARIMA(1,0,1) with zero mean     : 6020.215
##  ARIMA(1,0,1) with non-zero mean : 5999.536
##  ARIMA(1,0,2) with zero mean     : 6017.279
##  ARIMA(1,0,2) with non-zero mean : 5993.619
##  ARIMA(1,0,3) with zero mean     : 6001.594
##  ARIMA(1,0,3) with non-zero mean : 5967.485
##  ARIMA(1,0,4) with zero mean     : Inf
##  ARIMA(1,0,4) with non-zero mean : 5969.202
##  ARIMA(2,0,0) with zero mean     : 6019.921
##  ARIMA(2,0,0) with non-zero mean : 5997.9
##  ARIMA(2,0,1) with zero mean     : 6021.464
##  ARIMA(2,0,1) with non-zero mean : 5992.514
##  ARIMA(2,0,2) with zero mean     : 6016.071
##  ARIMA(2,0,2) with non-zero mean : 5981.685
##  ARIMA(2,0,3) with zero mean     : 6002.278
##  ARIMA(2,0,3) with non-zero mean : 5969.148
##  ARIMA(3,0,0) with zero mean     : 6018.809
##  ARIMA(3,0,0) with non-zero mean : 5989.986
##  ARIMA(3,0,1) with zero mean     : 6020.718
##  ARIMA(3,0,1) with non-zero mean : 5989.823
##  ARIMA(3,0,2) with zero mean     : 6024.638
##  ARIMA(3,0,2) with non-zero mean : 5977.429
##  ARIMA(4,0,0) with zero mean     : 6019.886
##  ARIMA(4,0,0) with non-zero mean : 5984.677
##  ARIMA(4,0,1) with zero mean     : 6015.737
##  ARIMA(4,0,1) with non-zero mean : 5981.114
##  ARIMA(5,0,0) with zero mean     : 5996.869
##  ARIMA(5,0,0) with non-zero mean : 5974.636
## 
## 
## 
##  Best model: ARIMA(0,0,4) with non-zero mean 
## 
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
## 
##  ARIMA(0,0,0) with zero mean     : 6510.13
##  ARIMA(0,0,0) with non-zero mean : 6331.928
##  ARIMA(0,0,1) with zero mean     : 6236.414
##  ARIMA(0,0,1) with non-zero mean : 6116.696
##  ARIMA(0,0,2) with zero mean     : 6152.059
##  ARIMA(0,0,2) with non-zero mean : 6063.454
##  ARIMA(0,0,3) with zero mean     : 6048.71
##  ARIMA(0,0,3) with non-zero mean : 5988.861
##  ARIMA(0,0,4) with zero mean     : 6027.292
##  ARIMA(0,0,4) with non-zero mean : 5982.374
##  ARIMA(0,0,5) with zero mean     : 6023.767
##  ARIMA(0,0,5) with non-zero mean : 5984.244
##  ARIMA(1,0,0) with zero mean     : 6034.656
##  ARIMA(1,0,0) with non-zero mean : 6016.774
##  ARIMA(1,0,1) with zero mean     : 6035.284
##  ARIMA(1,0,1) with non-zero mean : 6014.674
##  ARIMA(1,0,2) with zero mean     : 6032.32
##  ARIMA(1,0,2) with non-zero mean : 6008.712
##  ARIMA(1,0,3) with zero mean     : 6016.585
##  ARIMA(1,0,3) with non-zero mean : 5982.549
##  ARIMA(1,0,4) with zero mean     : Inf
##  ARIMA(1,0,4) with non-zero mean : 5984.266
##  ARIMA(2,0,0) with zero mean     : 6034.988
##  ARIMA(2,0,0) with non-zero mean : 6013.033
##  ARIMA(2,0,1) with zero mean     : 6036.536
##  ARIMA(2,0,1) with non-zero mean : 6007.66
##  ARIMA(2,0,2) with zero mean     : 6031.1
##  ARIMA(2,0,2) with non-zero mean : 5996.766
##  ARIMA(2,0,3) with zero mean     : 6017.27
##  ARIMA(2,0,3) with non-zero mean : 5984.212
##  ARIMA(3,0,0) with zero mean     : 6033.859
##  ARIMA(3,0,0) with non-zero mean : 6005.092
##  ARIMA(3,0,1) with zero mean     : 6035.767
##  ARIMA(3,0,1) with non-zero mean : 6004.945
##  ARIMA(3,0,2) with zero mean     : 6039.674
##  ARIMA(3,0,2) with non-zero mean : 5992.479
##  ARIMA(4,0,0) with zero mean     : 6034.938
##  ARIMA(4,0,0) with non-zero mean : 5999.836
##  ARIMA(4,0,1) with zero mean     : 6030.776
##  ARIMA(4,0,1) with non-zero mean : 5996.248
##  ARIMA(5,0,0) with zero mean     : 6011.85
##  ARIMA(5,0,0) with non-zero mean : 5989.709
## 
## 
## 
##  Best model: ARIMA(0,0,4) with non-zero mean 
## 
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
## 
##  ARIMA(0,0,0) with zero mean     : 6526.45
##  ARIMA(0,0,0) with non-zero mean : 6348.171
##  ARIMA(0,0,1) with zero mean     : 6251.991
##  ARIMA(0,0,1) with non-zero mean : 6132.262
##  ARIMA(0,0,2) with zero mean     : 6167.41
##  ARIMA(0,0,2) with non-zero mean : 6078.952
##  ARIMA(0,0,3) with zero mean     : 6063.771
##  ARIMA(0,0,3) with non-zero mean : 6003.996
##  ARIMA(0,0,4) with zero mean     : 6042.292
##  ARIMA(0,0,4) with non-zero mean : 5997.459
##  ARIMA(0,0,5) with zero mean     : 6038.756
##  ARIMA(0,0,5) with non-zero mean : 5999.328
##  ARIMA(1,0,0) with zero mean     : 6049.799
##  ARIMA(1,0,0) with non-zero mean : 6032.081
##  ARIMA(1,0,1) with zero mean     : 6050.408
##  ARIMA(1,0,1) with non-zero mean : 6029.95
##  ARIMA(1,0,2) with zero mean     : 6047.422
##  ARIMA(1,0,2) with non-zero mean : 6023.978
##  ARIMA(1,0,3) with zero mean     : 6031.571
##  ARIMA(1,0,3) with non-zero mean : 5997.635
##  ARIMA(1,0,4) with zero mean     : Inf
##  ARIMA(1,0,4) with non-zero mean : 5999.352
##  ARIMA(2,0,0) with zero mean     : 6050.107
##  ARIMA(2,0,0) with non-zero mean : 6028.299
##  ARIMA(2,0,1) with zero mean     : 6051.657
##  ARIMA(2,0,1) with non-zero mean : 6022.938
##  ARIMA(2,0,2) with zero mean     : 6046.168
##  ARIMA(2,0,2) with non-zero mean : 6011.962
##  ARIMA(2,0,3) with zero mean     : 6032.258
##  ARIMA(2,0,3) with non-zero mean : 5999.295
##  ARIMA(3,0,0) with zero mean     : 6048.962
##  ARIMA(3,0,0) with non-zero mean : 6020.35
##  ARIMA(3,0,1) with zero mean     : 6050.869
##  ARIMA(3,0,1) with non-zero mean : 6020.21
##  ARIMA(3,0,2) with zero mean     : 6054.801
##  ARIMA(3,0,2) with non-zero mean : 6007.615
##  ARIMA(4,0,0) with zero mean     : 6050.032
##  ARIMA(4,0,0) with non-zero mean : 6015.083
##  ARIMA(4,0,1) with zero mean     : 6045.838
##  ARIMA(4,0,1) with non-zero mean : 6011.443
##  ARIMA(5,0,0) with zero mean     : 6026.84
##  ARIMA(5,0,0) with non-zero mean : 6004.816
## 
## 
## 
##  Best model: ARIMA(0,0,4) with non-zero mean 
## 
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
## 
##  ARIMA(0,0,0) with zero mean     : 6542.781
##  ARIMA(0,0,0) with non-zero mean : 6364.329
##  ARIMA(0,0,1) with zero mean     : 6267.595
##  ARIMA(0,0,1) with non-zero mean : 6147.668
##  ARIMA(0,0,2) with zero mean     : 6182.832
##  ARIMA(0,0,2) with non-zero mean : 6094.112
##  ARIMA(0,0,3) with zero mean     : 6078.877
##  ARIMA(0,0,3) with non-zero mean : 6018.926
##  ARIMA(0,0,4) with zero mean     : 6057.375
##  ARIMA(0,0,4) with non-zero mean : 6012.338
##  ARIMA(0,0,5) with zero mean     : 6053.829
##  ARIMA(0,0,5) with non-zero mean : 6014.205
##  ARIMA(1,0,0) with zero mean     : 6064.848
##  ARIMA(1,0,0) with non-zero mean : 6047.08
##  ARIMA(1,0,1) with zero mean     : 6065.46
##  ARIMA(1,0,1) with non-zero mean : 6044.934
##  ARIMA(1,0,2) with zero mean     : 6062.476
##  ARIMA(1,0,2) with non-zero mean : 6038.936
##  ARIMA(1,0,3) with zero mean     : 6046.62
##  ARIMA(1,0,3) with non-zero mean : 6012.513
##  ARIMA(1,0,4) with zero mean     : Inf
##  ARIMA(1,0,4) with non-zero mean : 6014.226
##  ARIMA(2,0,0) with zero mean     : 6065.161
##  ARIMA(2,0,0) with non-zero mean : 6043.277
##  ARIMA(2,0,1) with zero mean     : 6066.7
##  ARIMA(2,0,1) with non-zero mean : 6037.888
##  ARIMA(2,0,2) with zero mean     : 6061.234
##  ARIMA(2,0,2) with non-zero mean : 6026.877
##  ARIMA(2,0,3) with zero mean     : 6047.286
##  ARIMA(2,0,3) with non-zero mean : 6014.172
##  ARIMA(3,0,0) with zero mean     : 6064.021
##  ARIMA(3,0,0) with non-zero mean : 6035.296
##  ARIMA(3,0,1) with zero mean     : 6065.928
##  ARIMA(3,0,1) with non-zero mean : 6035.151
##  ARIMA(3,0,2) with zero mean     : 6069.845
##  ARIMA(3,0,2) with non-zero mean : 6022.51
##  ARIMA(4,0,0) with zero mean     : 6065.091
##  ARIMA(4,0,0) with non-zero mean : 6030.014
##  ARIMA(4,0,1) with zero mean     : 6060.875
##  ARIMA(4,0,1) with non-zero mean : 6026.361
##  ARIMA(5,0,0) with zero mean     : 6041.812
##  ARIMA(5,0,0) with non-zero mean : 6019.71
## 
## 
## 
##  Best model: ARIMA(0,0,4) with non-zero mean 
## 
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
## 
##  ARIMA(0,0,0) with zero mean     : 6559.094
##  ARIMA(0,0,0) with non-zero mean : 6380.584
##  ARIMA(0,0,1) with zero mean     : 6283.162
##  ARIMA(0,0,1) with non-zero mean : 6163.289
##  ARIMA(0,0,2) with zero mean     : 6198.17
##  ARIMA(0,0,2) with non-zero mean : 6109.503
##  ARIMA(0,0,3) with zero mean     : 6093.933
##  ARIMA(0,0,3) with non-zero mean : 6033.971
##  ARIMA(0,0,4) with zero mean     : 6072.368
##  ARIMA(0,0,4) with non-zero mean : 6027.342
##  ARIMA(0,0,5) with zero mean     : 6068.806
##  ARIMA(0,0,5) with non-zero mean : 6029.196
##  ARIMA(1,0,0) with zero mean     : 6079.883
##  ARIMA(1,0,0) with non-zero mean : 6062.171
##  ARIMA(1,0,1) with zero mean     : 6080.491
##  ARIMA(1,0,1) with non-zero mean : 6060.036
##  ARIMA(1,0,2) with zero mean     : 6077.488
##  ARIMA(1,0,2) with non-zero mean : 6053.981
##  ARIMA(1,0,3) with zero mean     : 6061.583
##  ARIMA(1,0,3) with non-zero mean : 6027.496
##  ARIMA(1,0,4) with zero mean     : Inf
##  ARIMA(1,0,4) with non-zero mean : 6029.218
##  ARIMA(2,0,0) with zero mean     : 6080.191
##  ARIMA(2,0,0) with non-zero mean : 6058.382
##  ARIMA(2,0,1) with zero mean     : 6081.729
##  ARIMA(2,0,1) with non-zero mean : 6052.977
##  ARIMA(2,0,2) with zero mean     : 6076.237
##  ARIMA(2,0,2) with non-zero mean : 6041.887
##  ARIMA(2,0,3) with zero mean     : 6062.246
##  ARIMA(2,0,3) with non-zero mean : 6029.165
##  ARIMA(3,0,0) with zero mean     : 6079.038
##  ARIMA(3,0,0) with non-zero mean : 6050.357
##  ARIMA(3,0,1) with zero mean     : 6080.944
##  ARIMA(3,0,1) with non-zero mean : 6050.203
##  ARIMA(3,0,2) with zero mean     : 6085.093
##  ARIMA(3,0,2) with non-zero mean : 6037.523
##  ARIMA(4,0,0) with zero mean     : 6080.105
##  ARIMA(4,0,0) with non-zero mean : 6045.045
##  ARIMA(4,0,1) with zero mean     : 6075.871
##  ARIMA(4,0,1) with non-zero mean : 6041.364
##  ARIMA(5,0,0) with zero mean     : 6056.76
##  ARIMA(5,0,0) with non-zero mean : 6034.691
## 
## 
## 
##  Best model: ARIMA(0,0,4) with non-zero mean 
## 
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
## 
##  ARIMA(0,0,0) with zero mean     : 6575.409
##  ARIMA(0,0,0) with non-zero mean : 6396.792
##  ARIMA(0,0,1) with zero mean     : 6298.758
##  ARIMA(0,0,1) with non-zero mean : 6178.712
##  ARIMA(0,0,2) with zero mean     : 6213.517
##  ARIMA(0,0,2) with non-zero mean : 6124.772
##  ARIMA(0,0,3) with zero mean     : 6108.991
##  ARIMA(0,0,3) with non-zero mean : 6048.978
##  ARIMA(0,0,4) with zero mean     : 6087.362
##  ARIMA(0,0,4) with non-zero mean : 6042.298
##  ARIMA(0,0,5) with zero mean     : 6083.782
##  ARIMA(0,0,5) with non-zero mean : 6044.148
##  ARIMA(1,0,0) with zero mean     : 6094.915
##  ARIMA(1,0,0) with non-zero mean : 6077.183
##  ARIMA(1,0,1) with zero mean     : 6095.522
##  ARIMA(1,0,1) with non-zero mean : 6075.039
##  ARIMA(1,0,2) with zero mean     : 6092.5
##  ARIMA(1,0,2) with non-zero mean : 6068.988
##  ARIMA(1,0,3) with zero mean     : 6076.547
##  ARIMA(1,0,3) with non-zero mean : 6042.444
##  ARIMA(1,0,4) with zero mean     : Inf
##  ARIMA(1,0,4) with non-zero mean : 6044.17
##  ARIMA(2,0,0) with zero mean     : 6095.221
##  ARIMA(2,0,0) with non-zero mean : 6073.385
##  ARIMA(2,0,1) with zero mean     : 6096.766
##  ARIMA(2,0,1) with non-zero mean : 6067.976
##  ARIMA(2,0,2) with zero mean     : 6091.241
##  ARIMA(2,0,2) with non-zero mean : 6056.901
##  ARIMA(2,0,3) with zero mean     : 6077.207
##  ARIMA(2,0,3) with non-zero mean : 6044.12
##  ARIMA(3,0,0) with zero mean     : 6094.059
##  ARIMA(3,0,0) with non-zero mean : 6065.363
##  ARIMA(3,0,1) with zero mean     : 6095.965
##  ARIMA(3,0,1) with non-zero mean : 6065.203
##  ARIMA(3,0,2) with zero mean     : 6099.879
##  ARIMA(3,0,2) with non-zero mean : 6052.519
##  ARIMA(4,0,0) with zero mean     : 6095.127
##  ARIMA(4,0,0) with non-zero mean : 6060.024
##  ARIMA(4,0,1) with zero mean     : 6090.876
##  ARIMA(4,0,1) with non-zero mean : 6056.334
##  ARIMA(5,0,0) with zero mean     : 6071.702
##  ARIMA(5,0,0) with non-zero mean : 6049.647
## 
## 
## 
##  Best model: ARIMA(0,0,4) with non-zero mean 
## 
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
## 
##  ARIMA(0,0,0) with zero mean     : 6591.854
##  ARIMA(0,0,0) with non-zero mean : 6412.7
##  ARIMA(0,0,1) with zero mean     : 6314.502
##  ARIMA(0,0,1) with non-zero mean : 6193.966
##  ARIMA(0,0,2) with zero mean     : 6229.147
##  ARIMA(0,0,2) with non-zero mean : 6139.871
##  ARIMA(0,0,3) with zero mean     : 6124.307
##  ARIMA(0,0,3) with non-zero mean : 6063.881
##  ARIMA(0,0,4) with zero mean     : 6102.601
##  ARIMA(0,0,4) with non-zero mean : 6057.187
##  ARIMA(0,0,5) with zero mean     : 6098.999
##  ARIMA(0,0,5) with non-zero mean : 6059.039
##  ARIMA(1,0,0) with zero mean     : 6110.213
##  ARIMA(1,0,0) with non-zero mean : 6092.229
##  ARIMA(1,0,1) with zero mean     : 6110.815
##  ARIMA(1,0,1) with non-zero mean : 6090.062
##  ARIMA(1,0,2) with zero mean     : 6107.8
##  ARIMA(1,0,2) with non-zero mean : 6083.995
##  ARIMA(1,0,3) with zero mean     : 6091.745
##  ARIMA(1,0,3) with non-zero mean : 6057.339
##  ARIMA(1,0,4) with zero mean     : Inf
##  ARIMA(1,0,4) with non-zero mean : 6059.06
##  ARIMA(2,0,0) with zero mean     : 6110.515
##  ARIMA(2,0,0) with non-zero mean : 6088.399
##  ARIMA(2,0,1) with zero mean     : 6112.041
##  ARIMA(2,0,1) with non-zero mean : 6082.957
##  ARIMA(2,0,2) with zero mean     : 6106.526
##  ARIMA(2,0,2) with non-zero mean : 6071.829
##  ARIMA(2,0,3) with zero mean     : 6092.405
##  ARIMA(2,0,3) with non-zero mean : 6059.009
##  ARIMA(3,0,0) with zero mean     : 6109.361
##  ARIMA(3,0,0) with non-zero mean : 6080.34
##  ARIMA(3,0,1) with zero mean     : 6111.267
##  ARIMA(3,0,1) with non-zero mean : 6080.166
##  ARIMA(3,0,2) with zero mean     : 6115.199
##  ARIMA(3,0,2) with non-zero mean : 6067.443
##  ARIMA(4,0,0) with zero mean     : 6110.424
##  ARIMA(4,0,0) with non-zero mean : 6074.96
##  ARIMA(4,0,1) with zero mean     : 6106.144
##  ARIMA(4,0,1) with non-zero mean : 6071.252
##  ARIMA(5,0,0) with zero mean     : 6086.83
##  ARIMA(5,0,0) with non-zero mean : 6064.546
## 
## 
## 
##  Best model: ARIMA(0,0,4) with non-zero mean 
## 
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
## 
##  ARIMA(0,0,0) with zero mean     : 6608.296
##  ARIMA(0,0,0) with non-zero mean : 6428.606
##  ARIMA(0,0,1) with zero mean     : 6330.121
##  ARIMA(0,0,1) with non-zero mean : 6209.306
##  ARIMA(0,0,2) with zero mean     : 6244.493
##  ARIMA(0,0,2) with non-zero mean : 6155.067
##  ARIMA(0,0,3) with zero mean     : 6139.405
##  ARIMA(0,0,3) with non-zero mean : 6078.792
##  ARIMA(0,0,4) with zero mean     : 6117.614
##  ARIMA(0,0,4) with non-zero mean : 6072.074
##  ARIMA(0,0,5) with zero mean     : 6113.99
##  ARIMA(0,0,5) with non-zero mean : 6073.924
##  ARIMA(1,0,0) with zero mean     : 6125.247
##  ARIMA(1,0,0) with non-zero mean : 6107.211
##  ARIMA(1,0,1) with zero mean     : 6125.841
##  ARIMA(1,0,1) with non-zero mean : 6105.048
##  ARIMA(1,0,2) with zero mean     : 6122.815
##  ARIMA(1,0,2) with non-zero mean : 6098.955
##  ARIMA(1,0,3) with zero mean     : 6106.716
##  ARIMA(1,0,3) with non-zero mean : 6072.223
##  ARIMA(1,0,4) with zero mean     : Inf
##  ARIMA(1,0,4) with non-zero mean : 6073.946
##  ARIMA(2,0,0) with zero mean     : 6125.539
##  ARIMA(2,0,0) with non-zero mean : 6103.388
##  ARIMA(2,0,1) with zero mean     : 6127.066
##  ARIMA(2,0,1) with non-zero mean : 6097.953
##  ARIMA(2,0,2) with zero mean     : 6121.532
##  ARIMA(2,0,2) with non-zero mean : 6086.778
##  ARIMA(2,0,3) with zero mean     : 6107.374
##  ARIMA(2,0,3) with non-zero mean : 6073.896
##  ARIMA(3,0,0) with zero mean     : 6124.377
##  ARIMA(3,0,0) with non-zero mean : 6095.322
##  ARIMA(3,0,1) with zero mean     : 6126.283
##  ARIMA(3,0,1) with non-zero mean : 6095.148
##  ARIMA(3,0,2) with zero mean     : 6130.203
##  ARIMA(3,0,2) with non-zero mean : 6082.345
##  ARIMA(4,0,0) with zero mean     : 6125.44
##  ARIMA(4,0,0) with non-zero mean : 6089.931
##  ARIMA(4,0,1) with zero mean     : 6121.146
##  ARIMA(4,0,1) with non-zero mean : 6086.197
##  ARIMA(5,0,0) with zero mean     : 6101.778
##  ARIMA(5,0,0) with non-zero mean : 6079.455
## 
## 
## 
##  Best model: ARIMA(0,0,4) with non-zero mean 
## 
## [1] "input_series=ts(data$sold_count,freq=16)"
##             variable  n     mean       sd        CV      FBias      MAPE
## 1:    lm_prediction2 14 412.4286 232.3915 0.5634709 -4.6142064 5.1883257
## 2:    lm_prediction3 14 412.4286 232.3915 0.5634709 -4.7166790 5.3069999
## 3:    lm_prediction4 14 412.4286 232.3915 0.5634709 -4.5580423 5.0437342
## 4:    lm_prediction5 14 412.4286 232.3915 0.5634709  0.1458778 0.6688978
## 5:    lm_prediction6 14 412.4286 232.3915 0.5634709 -4.4771036 4.9528052
## 6:  arima_prediction 14 412.4286 232.3915 0.5634709 -0.3479633 0.7460511
## 7: sarima_prediction 14 412.4286 232.3915 0.5634709 -0.2817767 0.6716086
## 8:    selected_arima 14 412.4286 232.3915 0.5634709  0.1967600 0.6791326
##         RMSE       MAD      MADP     WMAPE
## 1: 2184.1016 1903.0305 4.6142064 4.6142064
## 2: 2238.7716 1945.2932 4.7166790 4.7166790
## 3: 2165.7375 1879.8669 4.5580423 4.5580423
## 4:  303.9193  245.5246 0.5953143 0.5953143
## 5: 2125.5963 1846.4854 4.4771036 4.4771036
## 6:  221.0945  188.7099 0.4575578 0.4575578
## 7:  203.6424  168.6188 0.4088437 0.4088437
## 8:  276.4342  230.1284 0.5579837 0.5579837

Smallest Weighted Mean Absolute Percentage Error is obtained for ARIMA(0,0,4) with 16 day frequency decomposition ,and it is the model that auto arima suggested. So further on this model is selected for our prediction purposes.

For conclusion, here is a plot of actual test set and predicted values of chosen model. As it can be seen, the predictions are are not too far.

One Day Ahead Prediction with the Selected Model for Product 5

With the selected model, 1 day ahead prediction can be performed using all the data on hand, since in this competition one day ahead prediction should be submitted.

##    price event_date product_content_id sold_count visit_count favored_count
## 1: 51.77 2021-07-02           31515569        267        6757           345
##    basket_count category_sold category_brand_sold category_visits ty_visits
## 1:         1075          6486                 887          383610  99819109
##    category_basket category_favored w_day mon is_campaign
## 1:           34514            33905     6   7           0
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.4811 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0231 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

## 
## Call:
## arima(x = detrend2, order = c(0, 0, 4), xreg = data_31515569$is_campaign, include.mean = TRUE)
## 
## Coefficients:
##          ma1     ma2     ma3      ma4  intercept  data_31515569$is_campaign
##       0.7895  0.5018  0.2625  -0.0056     0.9159                     0.6703
## s.e.  0.0537  0.0718  0.0721   0.0583     0.0541                     0.1014
## 
## sigma^2 estimated as 0.1694:  log likelihood = -206.46,  aic = 426.92
## [1] 426.9219
## [1] 454.649
## Time Series:
## Start = c(26, 5) 
## End = c(26, 5) 
## Frequency = 16 
## [1] 257.4225
##    price event_date product_content_id sold_count visit_count favored_count
## 1: 51.77 2021-07-04           31515569        267        6757           345
##    basket_count category_sold category_brand_sold category_visits ty_visits
## 1:         1075          6486                 887          383610  99819109
##    category_basket category_favored w_day mon is_campaign arima1_prediction
## 1:           34514            33905     6   7           0          257.4225

PRODUCT 6 - TrendyolMilla Bikini Top

Before making forecasting models for product 6, it should be looked at the plot of data and examined the seasonalities and trend. Below, you can see the plot of sales quantity of Product 6.For the empty places in sold counts, the mean of the data is taken. There is a slightly increasing trend, especially in the beginning and end of the plot. There can’t be seen any significant seasonality. To look further, there is a plot of 3 months of 2021 - March, April and May -. Again, the seasonality isn’t significant. In conclusion, it can be said that there is no seasonality.

Linear Regression Model For Product 6

First type of model that is going to used is linear regression model. First of all, it would be wise to select attributes that will help to model from correlation matrix. Below, you can see the correlations between the attributes. According to this matrix, just basket_count can be added to the model.

In the first model, the attribute is added to the model. The adjusted R-squared value indicates whether model is good or not. The value for the first model is pretty high which is a good sign. But there are outliers which is probably due to campaigns and holidays. The outliers can be eliminated for a better model. Lastly, ‘lag1’ attribute can be added because it is very high in the ACF. In the final linear regression model, adjusted R-squared value is high enough and plots are good enough to make predictions.

## 
## Call:
## lm(formula = sold_count ~ basket_count, data = sold)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.072  -1.754   1.148   1.148  22.764 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.999774   0.936931   10.67   <2e-16 ***
## basket_count  0.126031   0.005347   23.57   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.752 on 367 degrees of freedom
## Multiple R-squared:  0.6022, Adjusted R-squared:  0.6011 
## F-statistic: 555.6 on 1 and 367 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 66.191, df = 10, p-value = 2.398e-10

##    sold_count   
##  Min.   : 1.00  
##  1st Qu.:32.00  
##  Median :32.89  
##  Mean   :30.47  
##  3rd Qu.:32.89  
##  Max.   :81.00
## 
## Call:
## lm(formula = sold_count ~ big_outlier + small_outlier + basket_count, 
##     data = sold)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.3275  -0.3587  -0.3587  -0.3587  18.0575 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    21.95803    0.75528   29.07   <2e-16 ***
## big_outlier     8.24784    0.77985   10.58   <2e-16 ***
## small_outlier -13.21226    0.58250  -22.68   <2e-16 ***
## basket_count    0.06545    0.00424   15.44   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.156 on 365 degrees of freedom
## Multiple R-squared:  0.8501, Adjusted R-squared:  0.8489 
## F-statistic: 690.2 on 3 and 365 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 21.851, df = 10, p-value = 0.01588
## 
## Call:
## lm(formula = sold_count ~ lag1 + big_outlier + small_outlier + 
##     basket_count, data = sold)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.9457  -0.3268  -0.3268  -0.3268  15.8896 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    22.084918   0.741831  29.771  < 2e-16 ***
## lag1            0.201067   0.051769   3.884 0.000122 ***
## big_outlier     8.280673   0.765271  10.821  < 2e-16 ***
## small_outlier -13.436279   0.574476 -23.389  < 2e-16 ***
## basket_count    0.064946   0.004162  15.603  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.078 on 364 degrees of freedom
## Multiple R-squared:  0.8561, Adjusted R-squared:  0.8545 
## F-statistic: 541.4 on 4 and 364 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 10.558, df = 10, p-value = 0.3929

Arima Model For Product 6

Second type of model that is going to build is ARIMA model. For this model, in the beginning, the data should be decomposed. Firstly, a frequency value should be chosen. Since there is no significant seasonality, the highest value in the ACF will be chosen which is 9. Additive type of decomposition will be used for this task. Below, the random series can be seen.

After the decomposition, (p,d,q) values should be chosen for the model. For this task, ACF and PACF will be examined. Looking at the ACF, for ‘q’ value 3 can be chosen and looking at the PACF, for ‘p’ value 3 or 6 can be chosen. Also, auto.arima function is used as well. The AIC and BIC values of models that are suggested can be seen below. Looking at AIC and BIC values, (6,0,3) model is best among them. After the model is selected, the regressors that most correlates with the sold count are added to model to make it better. In the final model, the AIC and BIC values are lower. We can proceed with this model.

## 
## Call:
## arima(x = detrend, order = c(3, 0, 3))
## 
## Coefficients:
##          ar1      ar2      ar3      ma1      ma2     ma3  intercept
##       0.8921  -0.0218  -0.3943  -1.1329  -0.1835  0.3164    -0.0022
## s.e.  0.5828   0.8229   0.4372   0.5875   0.9698  0.3877     0.0030
## 
## sigma^2 estimated as 31.97:  log likelihood = -1141.24,  aic = 2298.49
## [1] 2298.488
## [1] 2329.599
## 
## Call:
## arima(x = detrend, order = c(6, 0, 3))
## 
## Coefficients:
##          ar1     ar2      ar3      ar4      ar5      ar6      ma1      ma2
##       0.3835  0.1168  -0.3827  -0.0999  -0.0047  -0.2076  -0.6080  -0.4489
## s.e.  0.2537  0.2274   0.1682   0.0954   0.0762   0.0602   0.2604   0.2583
##          ma3  intercept
##       0.0569    -0.0022
## s.e.  0.2284     0.0032
## 
## sigma^2 estimated as 31.25:  log likelihood = -1137.14,  aic = 2296.28
## [1] 2296.28
## [1] 2339.058
## Series: detrend 
## ARIMA(0,0,1) with non-zero mean 
## 
## Coefficients:
##          ma1     mean
##       0.2197  -0.0189
## s.e.  0.0486   0.4641
## 
## sigma^2 estimated as 52.61:  log likelihood=-1226.57
## AIC=2459.15   AICc=2459.22   BIC=2470.81
## [1] 2459.148
## [1] 2470.814
## 
## Call:
## arima(x = detrend, order = c(6, 0, 3), xreg = xreg)
## 
## Coefficients:
##          ar1     ar2      ar3     ar4     ar5      ar6      ma1      ma2
##       0.6479  0.2397  -0.6091  0.0157  0.0723  -0.1585  -0.8846  -0.5082
## s.e.  0.2398  0.2883   0.2012  0.0882  0.0822   0.0806   0.2410   0.3167
##          ma3  intercept    xreg
##       0.4353    -0.3086  0.0018
## s.e.  0.2588     0.1347  0.0008
## 
## sigma^2 estimated as 30.83:  log likelihood = -1133.08,  aic = 2290.16
## [1] 2290.163
## [1] 2336.829

Comparison Of Models

We selected two models for prediction. Here, it can be seen their accuracy values. According to box plot, the weighted mean absolute errors for Arima model is higher. We should choose Linear model because WMAPE value of the model is lower which is a sign for better model.

##          variable  n     mean       sd       CV      FBias      MAPE     RMSE
## 1:  lm_prediction 14 50.71429 11.75015 0.231693 0.07376177 0.1491078 10.37096
## 2: selected_arima 14 50.71429 11.75015 0.231693 0.05527895 0.2515232 15.59437
##          MAD      MADP     WMAPE
## 1:  8.006217 0.1578691 0.1578691
## 2: 12.673371 0.2498975 0.2498975

For conclusion, here is a plot of actual test set and predicted values of chosen model. As it can be seen, the predictions are pretty accurate.

Product 7 -Oral-B Rechargeable ToothBrush

First of all, the general behaviour of data is examined during the day by time plot.

Secocondly, the distribution in days and months is plotted to see if it is changed depend on month and day.

Finally , by ACF and PACF graph, the relationship between previous observations is observed.

It can be say that, there is a trend in data, and if trend factor is excluded, the autocorrelation between lag1, lag3 and lag is significant.

The data is depend on month and day factor by observing boxplot of data. Since the day factor is significant, day factor will be used in model construction instead of lag7 and the frequency of data determined as 7.

Examination of Attributes

The some of the attributes of data is not reliable, therefore, it is examined by summary of data.

##      price         event_date         product_content_id   sold_count    
##  Min.   :110.1   Min.   :2020-05-25   Length:405         Min.   :  0.00  
##  1st Qu.:129.9   1st Qu.:2020-09-03   Class :character   1st Qu.: 20.00  
##  Median :136.3   Median :2020-12-13   Mode  :character   Median : 57.00  
##  Mean   :135.3   Mean   :2020-12-13                      Mean   : 94.91  
##  3rd Qu.:141.6   3rd Qu.:2021-03-24                      3rd Qu.:139.00  
##  Max.   :165.9   Max.   :2021-07-03                      Max.   :513.00  
##  NA's   :9                                                               
##   visit_count    favored_count   basket_count    category_sold 
##  Min.   :    0   Min.   :   0   Min.   :   0.0   Min.   : 321  
##  1st Qu.:    0   1st Qu.:   0   1st Qu.:  92.0   1st Qu.: 610  
##  Median :    0   Median : 175   Median : 240.0   Median : 802  
##  Mean   : 2267   Mean   : 356   Mean   : 399.2   Mean   :1008  
##  3rd Qu.: 4265   3rd Qu.: 588   3rd Qu.: 578.0   3rd Qu.:1099  
##  Max.   :15725   Max.   :2696   Max.   :2249.0   Max.   :5557  
##                                                                
##  category_brand_sold category_visits   ty_visits         category_basket 
##  Min.   :    0       Min.   :  346   Min.   :        1   Min.   :     0  
##  1st Qu.:    0       1st Qu.:  657   1st Qu.:        1   1st Qu.:     0  
##  Median :  693       Median :  880   Median :        1   Median :     0  
##  Mean   : 2991       Mean   : 3896   Mean   : 44737307   Mean   : 18591  
##  3rd Qu.: 5354       3rd Qu.: 1349   3rd Qu.:102143446   3rd Qu.: 41265  
##  Max.   :28944       Max.   :59310   Max.   :178545693   Max.   :281022  
##                                                                          
##  category_favored     w_day            mon          is_campaign     
##  Min.   : 1242    Min.   :1.000   Min.   : 1.000   Min.   :0.00000  
##  1st Qu.: 2476    1st Qu.:2.000   1st Qu.: 4.000   1st Qu.:0.00000  
##  Median : 3298    Median :4.000   Median : 6.000   Median :0.00000  
##  Mean   : 4202    Mean   :4.007   Mean   : 6.464   Mean   :0.08642  
##  3rd Qu.: 4869    3rd Qu.:6.000   3rd Qu.: 9.000   3rd Qu.:0.00000  
##  Max.   :44445    Max.   :7.000   Max.   :12.000   Max.   :1.00000  
## 
##         price sold_count visit_count favored_count basket_count category_sold
## [1,] 112.9000          0           0             0            0           321
## [2,] 129.9000         20           0             0           92           610
## [3,] 136.2828         57           0           175          240           802
## [4,] 141.6109        139        4265           588          578          1099
## [5,] 158.1300        315       10646          1465         1287          1799
##      category_brand_sold category_visits ty_visits category_basket
## [1,]                   0             346         1               0
## [2,]                   0             657         1               0
## [3,]                 693             880         1               0
## [4,]                5354            1349 102143446           41265
## [5,]               12868            2348 178545693           95301
##      category_favored w_day
## [1,]             1242     1
## [2,]             2476     2
## [3,]             3298     4
## [4,]             4869     6
## [5,]             8278     7

The relationship of attributes and response variable is observed by correlation grapgh.

Basket_count, category_visits and category_favored has high correlation and it seems reliable data from summary of data.However, there is 0 values which is not expected in real life therefore, the zero values are changed as mean.

ty_visits also has 1 value before particular date and it is changed as mean of ty_visits.

Some price values are NA, and they are changed as mean of price since price is not has significant changes during the time.

In the end, “price”,“visit_count”, “basket_count”,“category_favored” , “ty_visits”,“is_campaign” values determined as regressors.

the data will be predicted based on previous observations attributes since the real attributes not available for prediction time.

Model construction

the data has no constant variance therefore, besides the simple linear model, the sqrt transformation and boxcox tranformation is used for simple regression model

simple linear regression with no transformation

By many iterations, it is seen that day factor is not significant as is expected.

## 
## Call:
## lm(formula = sold_count ~ price + visit_count + basket_count + 
##     category_basket + factor(mon) + factor(is_campaign) + trend + 
##     lag1 + lag3, data = train7)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -121.274   -9.266   -0.198    7.729  121.941 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           6.364e+01  3.324e+01   1.914 0.056364 .  
## price                -7.582e-01  2.381e-01  -3.184 0.001576 ** 
## visit_count          -1.032e-02  1.641e-03  -6.287 9.13e-10 ***
## basket_count          2.258e-01  1.012e-02  22.312  < 2e-16 ***
## category_basket       2.713e-04  8.129e-05   3.338 0.000931 ***
## factor(mon)2         -8.347e+00  8.170e+00  -1.022 0.307592    
## factor(mon)3         -1.863e+01  7.361e+00  -2.530 0.011807 *  
## factor(mon)4         -1.232e+01  8.375e+00  -1.471 0.142073    
## factor(mon)5          2.585e+01  8.131e+00   3.179 0.001602 ** 
## factor(mon)6          2.282e+01  6.630e+00   3.442 0.000643 ***
## factor(mon)7          2.496e+01  7.462e+00   3.344 0.000909 ***
## factor(mon)8          1.643e+01  7.158e+00   2.296 0.022238 *  
## factor(mon)9         -1.057e+00  7.680e+00  -0.138 0.890645    
## factor(mon)10        -5.613e-01  6.755e+00  -0.083 0.933815    
## factor(mon)11         4.073e+00  6.533e+00   0.623 0.533369    
## factor(mon)12        -3.015e+00  5.780e+00  -0.522 0.602269    
## factor(is_campaign)1  7.251e-01  4.654e+00   0.156 0.876270    
## trend                 1.782e-01  2.499e-02   7.130 5.32e-12 ***
## lag1                  1.754e-01  2.743e-02   6.395 4.84e-10 ***
## lag3                  4.160e-02  2.170e-02   1.917 0.056001 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.41 on 370 degrees of freedom
## Multiple R-squared:  0.9504, Adjusted R-squared:  0.9478 
## F-statistic: 372.9 on 19 and 370 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 23
## 
## data:  Residuals
## LM test = 58.281, df = 23, p-value = 6.744e-05

the residuals analysis is good for lm model with no significant autocorrelation around mean zero, however, the variablity of error in higher values is higher.

simple linear regression with sqrt() transformation

By many iterations, the

## 
## Call:
## lm(formula = sqrt ~ price + visit_count + basket_count + ty_visits + 
##     factor(mon) + lag1 + factor(is_campaign) + category_visits + 
##     category_basket, data = train7)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.0045 -0.6253  0.0536  0.6764  3.7677 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           1.418e+01  1.818e+00   7.804 6.22e-14 ***
## price                -7.325e-02  1.230e-02  -5.956 6.03e-09 ***
## visit_count          -9.906e-04  9.360e-05 -10.583  < 2e-16 ***
## basket_count          1.133e-02  5.471e-04  20.703  < 2e-16 ***
## ty_visits             4.269e-08  4.811e-09   8.875  < 2e-16 ***
## factor(mon)2         -3.187e-01  4.807e-01  -0.663 0.507661    
## factor(mon)3         -4.962e-01  4.251e-01  -1.167 0.243875    
## factor(mon)4         -5.304e-01  4.786e-01  -1.108 0.268413    
## factor(mon)5         -7.138e-01  4.597e-01  -1.553 0.121313    
## factor(mon)6         -1.599e+00  3.481e-01  -4.594 5.98e-06 ***
## factor(mon)7         -1.911e+00  3.513e-01  -5.440 9.70e-08 ***
## factor(mon)8         -1.667e+00  3.690e-01  -4.517 8.44e-06 ***
## factor(mon)9         -1.600e+00  4.209e-01  -3.801 0.000169 ***
## factor(mon)10        -1.698e+00  3.656e-01  -4.645 4.74e-06 ***
## factor(mon)11        -1.202e+00  3.547e-01  -3.388 0.000779 ***
## factor(mon)12        -3.743e-01  3.134e-01  -1.194 0.233053    
## lag1                  1.217e-02  1.313e-03   9.275  < 2e-16 ***
## factor(is_campaign)1 -2.161e-01  2.537e-01  -0.852 0.395006    
## category_visits       4.162e-05  1.234e-05   3.371 0.000827 ***
## category_basket      -1.017e-05  4.776e-06  -2.129 0.033882 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.217 on 370 degrees of freedom
## Multiple R-squared:  0.9399, Adjusted R-squared:  0.9368 
## F-statistic: 304.7 on 19 and 370 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 23
## 
## data:  Residuals
## LM test = 95.006, df = 23, p-value = 1.012e-10

the residuals analysis is model with significant autocorrelation in lag1 around mean zero, however, the variablity of error in higher values is higher. It is poor by comparing lm model with no transformation.

simple linear regression with BoxCox transformation

By many iterations, it is seen that day factornot significant but lag7 is significant and lag3 factor is not significant for boxcox linear model therefore, they excluded and the category_basket is significant for boxcox transformation.

## 
## Call:
## lm(formula = BoxCox ~ price + visit_count + basket_count + category_favored + 
##     ty_visits + factor(mon) + lag1 + lag7 + factor(is_campaign) + 
##     category_basket, data = train7)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.7760 -0.3972  0.1812  0.6479  2.3175 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           1.278e+01  2.177e+00   5.872 9.63e-09 ***
## price                -6.817e-02  1.482e-02  -4.599 5.85e-06 ***
## visit_count          -5.974e-04  1.195e-04  -4.999 8.92e-07 ***
## basket_count          5.067e-03  7.149e-04   7.087 7.01e-12 ***
## category_favored      1.369e-04  4.166e-05   3.286  0.00111 ** 
## ty_visits             4.048e-08  4.666e-09   8.677  < 2e-16 ***
## factor(mon)2          2.373e-01  5.810e-01   0.409  0.68311    
## factor(mon)3         -5.853e-02  5.009e-01  -0.117  0.90704    
## factor(mon)4         -7.257e-01  5.495e-01  -1.321  0.18744    
## factor(mon)5         -1.649e+00  5.416e-01  -3.045  0.00249 ** 
## factor(mon)6         -2.041e+00  4.065e-01  -5.021 8.01e-07 ***
## factor(mon)7         -2.185e+00  4.234e-01  -5.160 4.04e-07 ***
## factor(mon)8         -1.384e+00  4.420e-01  -3.132  0.00187 ** 
## factor(mon)9         -1.402e+00  5.038e-01  -2.783  0.00566 ** 
## factor(mon)10        -1.334e+00  4.359e-01  -3.060  0.00237 ** 
## factor(mon)11        -1.229e+00  4.308e-01  -2.852  0.00459 ** 
## factor(mon)12        -2.598e-01  3.750e-01  -0.693  0.48882    
## lag1                  6.648e-03  1.573e-03   4.227 2.98e-05 ***
## lag7                  2.820e-03  1.239e-03   2.277  0.02338 *  
## factor(is_campaign)1 -3.815e-01  3.167e-01  -1.205  0.22912    
## category_basket      -3.401e-05  7.166e-06  -4.746 2.97e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.452 on 369 degrees of freedom
## Multiple R-squared:  0.7651, Adjusted R-squared:  0.7523 
## F-statistic: 60.08 on 20 and 369 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 24
## 
## data:  Residuals
## LM test = 197.94, df = 24, p-value < 2.2e-16

By residuals analysis, boxcox model has big deviation in time and adjusted R-squared value is lower than others.

Arima Models

When arima models is constructed, the auto.arima function is used, and in every day the auto.arima function is runs again. the seasonality is TRUE, and frequency is determined as seven by observing ACF and PACF graph.

Additive Model, Multplive Model, and linear regression model is used for decomposition and get stationary data.

## [1] "The Additive Model"
## 
## ####################################### 
## # KPSS Unit Root / Cointegration Test # 
## ####################################### 
## 
## The value of the test statistic is: 0.0069
## [1] "The Multiplicative Model"
## 
## ####################################### 
## # KPSS Unit Root / Cointegration Test # 
## ####################################### 
## 
## The value of the test statistic is: 0.2127
## [1] "Linear Regression"
## 
## ####################################### 
## # KPSS Unit Root / Cointegration Test # 
## ####################################### 
## 
## The value of the test statistic is: 0.0244

the multiplive model is not significant, therefore I will use the addtive decomposition for arima and arima regressors models.

the linear regression model residuals are stationary therefore, the residuals use for arima model and they combined in the end.

the regressors mentioned above is used for arima model with regressors.

Arima Model

## Series: decomposed$random 
## ARIMA(0,0,1)(0,0,2)[7] with non-zero mean 
## 
## Coefficients:
##          ma1    sma1     sma2    mean
##       0.3325  0.0897  -0.0993  0.2025
## s.e.  0.0468  0.0526   0.0522  2.2903
## 
## sigma^2 estimated as 1165:  log likelihood=-1898.69
## AIC=3807.38   AICc=3807.54   BIC=3827.13

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1)(0,0,2)[7] with non-zero mean
## Q* = 52.251, df = 10, p-value = 1.025e-07
## 
## Model df: 4.   Total lags used: 14

By observing, pacf is significant at lag1 and acf drops after lag1 therfore, it is reasonable auto.arima gives the MA(1). And at lag2 as seasonal the pacf and acf is significant, the seasonal order(0,0,2) is reasonable, too.

## Series: decomposed$random 
## Regression with ARIMA(5,1,1) errors 
## 
## Coefficients:
##         ar1      ar2      ar3      ar4      ar5      ma1     xreg
##       0.174  -0.3550  -0.2993  -0.0638  -0.2540  -0.9827  -0.4743
## s.e.  0.050   0.0506   0.0515   0.0506   0.0505   0.0133   0.2082
## 
## sigma^2 estimated as 940.2:  log likelihood=-1853.64
## AIC=3723.27   AICc=3723.66   BIC=3754.85

## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(5,1,1) errors
## Q* = 19.569, df = 7, p-value = 0.00658
## 
## Model df: 7.   Total lags used: 14
## [1] 3723.27

By residual analysis, the arima with regressors has no autocorrelated residuals and lower AIC, therefore arima with regressors is better model than arima.

Arima combined with linear Regression

## Series: residuals 
## ARIMA(0,0,3) with zero mean 
## 
## Coefficients:
##          ma1     ma2     ma3
##       0.1687  0.1556  0.0900
## s.e.  0.0504  0.0510  0.0537
## 
## sigma^2 estimated as 454:  log likelihood=-1744.93
## AIC=3497.86   AICc=3497.97   BIC=3513.73

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,3) with zero mean
## Q* = 3.6185, df = 7, p-value = 0.8225
## 
## Model df: 3.   Total lags used: 10

the auto arima model on residuals give zero mean and no autocorrelated residuals and lower AIC valu, it is better than arima and arima regressor model by residual analysis.

Predictions

The predictions based on the last available attributes, and the predictions plotted with actual sales values.

##     event_date actual sqrt_forecasted_sold BoxCox_forecasted_sold
##  1: 2021-06-19    104             85.19013               46.48724
##  2: 2021-06-20    149            142.34921              105.17396
##  3: 2021-06-21    128            116.80927              111.63219
##  4: 2021-06-22     56             97.96816               82.82714
##  5: 2021-06-23     59             65.41235               53.70646
##  6: 2021-06-24     56             63.05931               51.09164
##  7: 2021-06-25     36             55.53599               42.38736
##  8: 2021-06-26     40             52.72843               38.58299
##  9: 2021-06-27     46             72.90855               71.13411
## 10: 2021-06-28     64             73.59749               64.87460
## 11: 2021-06-29    137            120.37899              111.99730
## 12: 2021-06-30    131            133.14419              127.36637
## 13: 2021-07-01    130            106.68538               86.68351
## 14: 2021-07-02    108             97.23984               79.91181
##     lm_forecasted_sold forecasted_lm7_arima add_arima_forecasted
##  1:          122.10110            116.96870            151.61943
##  2:          156.84887            152.07278            145.17374
##  3:          130.65940            126.83047            158.80660
##  4:          108.49373            107.28933            154.81502
##  5:           82.48148             74.54122            134.35688
##  6:           77.32606             67.77444            120.78775
##  7:           69.69560             61.49401             97.99568
##  8:           70.19297             63.37929             74.91833
##  9:           67.03113             58.76189             57.04774
## 10:           79.05741             71.53800             52.04136
## 11:          131.06600            126.11470             55.21875
## 12:          140.90974            140.62073             74.90336
## 13:          139.55899            139.03424             87.46261
## 14:          124.65919            127.33803             86.98584
##     reg_add_arima_forecasted
##  1:                152.31646
##  2:                145.80627
##  3:                176.87708
##  4:                175.92230
##  5:                140.35727
##  6:                109.98016
##  7:                 94.61315
##  8:                 79.87082
##  9:                 65.01756
## 10:                 56.92829
## 11:                 51.75949
## 12:                 68.10496
## 13:                 84.87163
## 14:                 87.40229

EROR rate of Models

##                       model  n     mean       sd        CV       FBias
## 1:     sqrt_forecasted_sold 14 88.85714 41.34072 0.4652492 -0.03135634
## 2:   BoxCox_forecasted_sold 14 88.85714 41.34072 0.4652492  0.13677116
## 3:       lm_forecasted_sold 14 88.85714 41.34072 0.4652492 -0.20585343
## 4:     forecasted_lm7_arima 14 88.85714 41.34072 0.4652492 -0.15253843
## 5:     add_arima_forecasted 14 88.85714 41.34072 0.4652492 -0.16730956
## 6: reg_add_arima_forecasted 14 88.85714 41.34072 0.4652492 -0.19761071
##         MAPE     RMSE      MAD      MADP     WMAPE
## 1: 0.2363980 18.27491 15.26440 0.1717859 0.1717859
## 2: 0.2291337 27.05425 20.61355 0.2319853 0.2319853
## 3: 0.3352665 22.95409 19.13926 0.2153936 0.2153936
## 4: 0.2595226 19.40657 15.27625 0.1719192 0.1719192
## 5: 0.6779985 53.64278 45.89727 0.5165288 0.5165288
## 6: 0.7243721 58.41190 49.57728 0.5579436 0.5579436

Since the arima model combined model has the lowest WMAPE value it is selected for prediction. However, In every day, the error rates are calculated for last 14 days and the model predictions and the model prediction has the lowest WMAPE value of is selected.

Predictions of Next Day

##         add_arima    xreg_add_arima       forecast_lm forecast_lm_arima 
##          85.54331          89.25217         118.33530         120.30736 
##         BoxCox_lm           Sqrt_lm 
##          78.85804          94.00000

Product 8 - Altinyildiz Classics Jacket

It can be seen that the sales is zero most of time, however, there is huge increase in October.

The ACF and PACF of data shows that there is significant autocorrelation in lag1 and lag7.

the Examination of Attirubutes

the correlation of price, visit_count, and basket_count is high and it is expected if the sold_count is zero this variables can be zero.

However, it is not expected that category favored and trendyol visits is zero or one therefore these variables changed as mean.

##      price         event_date         product_content_id   sold_count     
##  Min.   : -1.0   Min.   :2020-05-25   Length:405         Min.   : 0.0000  
##  1st Qu.:350.0   1st Qu.:2020-09-03   Class :character   1st Qu.: 0.0000  
##  Median :600.0   Median :2020-12-13   Mode  :character   Median : 0.0000  
##  Mean   :559.3   Mean   :2020-12-13                      Mean   : 0.9284  
##  3rd Qu.:734.3   3rd Qu.:2021-03-24                      3rd Qu.: 0.0000  
##  Max.   :833.3   Max.   :2021-07-03                      Max.   :52.0000  
##  NA's   :303                                                              
##   visit_count     favored_count     basket_count    category_sold   
##  Min.   :  0.00   Min.   : 0.000   Min.   :  0.00   Min.   :   0.0  
##  1st Qu.:  0.00   1st Qu.: 0.000   1st Qu.:  0.00   1st Qu.:  16.0  
##  Median :  0.00   Median : 0.000   Median :  0.00   Median :  45.0  
##  Mean   : 27.24   Mean   : 2.242   Mean   :  5.83   Mean   : 200.2  
##  3rd Qu.:  3.00   3rd Qu.: 2.000   3rd Qu.:  5.00   3rd Qu.: 111.0  
##  Max.   :516.00   Max.   :37.000   Max.   :247.00   Max.   :3299.0  
##                                                                     
##  category_brand_sold category_visits    ty_visits         category_basket  
##  Min.   :     0      Min.   :   367   Min.   :        1   Min.   :      0  
##  1st Qu.:     0      1st Qu.:  1432   1st Qu.:        1   1st Qu.:      0  
##  Median :     6      Median :  5324   Median :        1   Median :      0  
##  Mean   : 46247      Mean   : 27767   Mean   : 44737307   Mean   : 353021  
##  3rd Qu.: 94562      3rd Qu.:  9538   3rd Qu.:102143446   3rd Qu.: 464380  
##  Max.   :259590      Max.   :583672   Max.   :178545693   Max.   :3102147  
##                                                                            
##  category_favored     w_day            mon          is_campaign     
##  Min.   :  2324   Min.   :1.000   Min.   : 1.000   Min.   :0.00000  
##  1st Qu.:  8618   1st Qu.:2.000   1st Qu.: 4.000   1st Qu.:0.00000  
##  Median : 24534   Median :4.000   Median : 6.000   Median :0.00000  
##  Mean   : 33688   Mean   :4.007   Mean   : 6.464   Mean   :0.08642  
##  3rd Qu.: 50341   3rd Qu.:6.000   3rd Qu.: 9.000   3rd Qu.:0.00000  
##  Max.   :244883   Max.   :7.000   Max.   :12.000   Max.   :1.00000  
## 
##       price sold_count visit_count favored_count basket_count category_sold
## [1,]  -1.00          0           0             0            0             0
## [2,] 349.99          0           0             0            0            16
## [3,] 599.98          0           0             0            0            45
## [4,] 736.64          0           3             2            5           111
## [5,] 833.32          0           7             5           12           248
##      category_brand_sold category_visits ty_visits category_basket
## [1,]                   0             367         1               0
## [2,]                   0            1432         1               0
## [3,]                   6            5324         1               0
## [4,]               94562            9538 102143446          464380
## [5,]              235840           21187 178545693         1158593
##      category_favored w_day
## [1,]             2324     1
## [2,]             8618     2
## [3,]            24534     4
## [4,]            50341     6
## [5,]           111346     7

By considering correlation and variable relaibility the “price”,“visit_count”, “basket_count”,“category_favored” are selected as regressors.

The acf and pacf garph is shows high correlation in lag1,lag2,lag5 and lag7 therefore they are added as attirbutes.

Since Jacket is expensive product, it is expected that consumers consider the previous price of jacket. Therefore, previous prices of Jacket is examined.

the data will be predicted based on previous observations attributes since the real attributes not available for prediction time.

model construction

the data has no constant variance therefore, besides the simple linear model, the sqrt transformation and boxcox tranformation is used for simple regression model

Simple Regression

By many iterations, it is seen that most significant variables are price, visit_count, basket_count, category_favored,factor( w_day ), factor(mon),lag1,lag2,price_lag_4.

## 
## Call:
## lm(formula = sold_count ~ price + visit_count + basket_count + 
##     category_favored + factor(w_day) + factor(mon) + lag1 + lag2 + 
##     price_lag_4, data = train8)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.7381 -0.2841 -0.0468  0.2941  6.6801 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       1.330e+00  3.547e-01   3.750 0.000205 ***
## price             1.418e-03  4.286e-04   3.309 0.001029 ** 
## visit_count       1.290e-03  1.189e-03   1.085 0.278473    
## basket_count      1.875e-01  4.545e-03  41.246  < 2e-16 ***
## category_favored -2.489e-05  3.524e-06  -7.062 8.33e-12 ***
## factor(w_day)2    4.478e-01  2.281e-01   1.963 0.050385 .  
## factor(w_day)3    3.275e-01  2.294e-01   1.428 0.154278    
## factor(w_day)4    5.850e-01  2.294e-01   2.550 0.011173 *  
## factor(w_day)5    4.696e-01  2.311e-01   2.032 0.042920 *  
## factor(w_day)6    3.276e-01  2.303e-01   1.422 0.155776    
## factor(w_day)7    1.595e-01  2.291e-01   0.696 0.486652    
## factor(mon)2     -5.766e-02  3.193e-01  -0.181 0.856773    
## factor(mon)3     -4.079e-01  3.128e-01  -1.304 0.193058    
## factor(mon)4     -7.966e-01  3.313e-01  -2.404 0.016694 *  
## factor(mon)5     -1.088e+00  3.607e-01  -3.016 0.002739 ** 
## factor(mon)6     -1.503e+00  3.585e-01  -4.192 3.47e-05 ***
## factor(mon)7     -1.489e+00  3.760e-01  -3.960 9.01e-05 ***
## factor(mon)8     -1.373e+00  3.684e-01  -3.727 0.000224 ***
## factor(mon)9     -1.250e+00  3.570e-01  -3.502 0.000520 ***
## factor(mon)10     6.976e-01  4.587e-01   1.521 0.129181    
## factor(mon)11    -1.549e+00  3.584e-01  -4.323 1.99e-05 ***
## factor(mon)12    -9.394e-02  3.089e-01  -0.304 0.761183    
## lag1             -5.277e-05  2.168e-02  -0.002 0.998059    
## lag2             -7.379e-02  2.135e-02  -3.456 0.000612 ***
## price_lag_4      -2.195e-03  3.865e-04  -5.679 2.76e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.2 on 365 degrees of freedom
## Multiple R-squared:  0.8894, Adjusted R-squared:  0.8821 
## F-statistic: 122.3 on 24 and 365 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 28
## 
## data:  Residuals
## LM test = 150.89, df = 28, p-value < 2.2e-16

Simple Linear Regression with sqrt() transformation

By many iteration, it is seen that the plag_4 and lag_2 is not significant for sqrt transformation model, lag5 is significant.

## 
## Call:
## lm(formula = sqrt ~ price + visit_count + basket_count + category_favored + 
##     factor(w_day) + factor(mon) + lag1 + lag5, data = train8)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.14135 -0.06833  0.00032  0.05779  1.38530 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -1.452e-01  8.630e-02  -1.683 0.093257 .  
## price             1.862e-03  1.049e-04  17.759  < 2e-16 ***
## visit_count       1.884e-03  2.897e-04   6.502 2.60e-10 ***
## basket_count      2.595e-02  1.117e-03  23.222  < 2e-16 ***
## category_favored -2.433e-06  8.906e-07  -2.732 0.006599 ** 
## factor(w_day)2    1.256e-01  5.663e-02   2.217 0.027225 *  
## factor(w_day)3    7.640e-02  5.668e-02   1.348 0.178490    
## factor(w_day)4    9.278e-02  5.665e-02   1.638 0.102333    
## factor(w_day)5    4.810e-02  5.707e-02   0.843 0.399890    
## factor(w_day)6    5.388e-02  5.672e-02   0.950 0.342781    
## factor(w_day)7    1.172e-02  5.663e-02   0.207 0.836212    
## factor(mon)2     -1.259e-01  7.890e-02  -1.596 0.111264    
## factor(mon)3     -6.078e-02  7.733e-02  -0.786 0.432396    
## factor(mon)4     -9.304e-02  8.204e-02  -1.134 0.257475    
## factor(mon)5      5.873e-02  8.940e-02   0.657 0.511630    
## factor(mon)6     -1.577e-01  8.895e-02  -1.774 0.076969 .  
## factor(mon)7     -1.579e-01  9.348e-02  -1.689 0.092049 .  
## factor(mon)8     -1.472e-01  9.156e-02  -1.607 0.108821    
## factor(mon)9     -1.489e-01  8.865e-02  -1.679 0.093921 .  
## factor(mon)10     1.899e-02  1.055e-01   0.180 0.857272    
## factor(mon)11    -2.973e-01  8.586e-02  -3.463 0.000598 ***
## factor(mon)12    -7.452e-02  7.598e-02  -0.981 0.327344    
## lag1              2.248e-02  5.270e-03   4.266 2.54e-05 ***
## lag5             -8.956e-03  5.015e-03  -1.786 0.074978 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2966 on 366 degrees of freedom
## Multiple R-squared:  0.8919, Adjusted R-squared:  0.8851 
## F-statistic: 131.3 on 23 and 366 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 27
## 
## data:  Residuals
## LM test = 138.68, df = 27, p-value < 2.2e-16

In residual analysis there is no significant difference, and adjusted R-square value of squared transformation is higher.

Simple Linear Regression Model with BoxCox Transformation

By many iteration, price, visit_count, basket_count, category_favored, factor( w_day ), factor(mon), lag1 are most significant variables for Simple Linear Regression Model with BoxCox Transformation.

## 
## Call:
## lm(formula = BoxCox ~ price + visit_count + basket_count + category_favored + 
##     factor(w_day) + factor(mon) + lag1, data = train8)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0463 -0.1968 -0.0363  0.1387  4.1884 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -8.686e+00  2.907e-01 -29.884  < 2e-16 ***
## price             1.198e-02  3.529e-04  33.939  < 2e-16 ***
## visit_count       9.627e-03  9.792e-04   9.831  < 2e-16 ***
## basket_count      2.921e-02  3.754e-03   7.781 7.38e-14 ***
## category_favored -8.752e-07  2.941e-06  -0.298 0.766192    
## factor(w_day)2    2.736e-01  1.911e-01   1.432 0.152962    
## factor(w_day)3    1.992e-01  1.920e-01   1.037 0.300208    
## factor(w_day)4    1.375e-01  1.921e-01   0.716 0.474649    
## factor(w_day)5   -1.911e-03  1.935e-01  -0.010 0.992125    
## factor(w_day)6    1.953e-01  1.923e-01   1.016 0.310483    
## factor(w_day)7   -3.231e-02  1.920e-01  -0.168 0.866440    
## factor(mon)2     -6.849e-01  2.675e-01  -2.560 0.010856 *  
## factor(mon)3     -2.233e-01  2.619e-01  -0.852 0.394519    
## factor(mon)4     -2.158e-01  2.773e-01  -0.778 0.436990    
## factor(mon)5      1.364e+00  3.013e-01   4.527 8.08e-06 ***
## factor(mon)6     -1.162e-01  2.995e-01  -0.388 0.698387    
## factor(mon)7     -2.235e-01  3.144e-01  -0.711 0.477592    
## factor(mon)8     -2.174e-01  3.080e-01  -0.706 0.480741    
## factor(mon)9     -2.665e-01  2.986e-01  -0.892 0.372773    
## factor(mon)10    -3.659e-01  3.567e-01  -1.026 0.305735    
## factor(mon)11    -5.961e-01  2.829e-01  -2.107 0.035797 *  
## factor(mon)12    -2.368e-01  2.576e-01  -0.919 0.358460    
## lag1              6.100e-02  1.775e-02   3.436 0.000658 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.005 on 367 degrees of freedom
## Multiple R-squared:  0.918,  Adjusted R-squared:  0.9131 
## F-statistic: 186.8 on 22 and 367 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 26
## 
## data:  Residuals
## LM test = 114.48, df = 26, p-value = 4.496e-13

In residual analysis and adjusted R-squared comparison BoxCox is better than others, however, it is very sensitive to back transformation, therefore, maybe predictions can be poor.

Arima Models

When arima models is constructed, the auto.arima function is used, and in every day the auto.arima function is runs again. the seasonality is TRUE, and frequency is determined as seven by observing ACF and PACF graph.

Additive Model, Multplive Model, and linear regression model is used for decomposition and get stationary data.

## [1] "The Additive Model"
## 
## ####################################### 
## # KPSS Unit Root / Cointegration Test # 
## ####################################### 
## 
## The value of the test statistic is: 0.0089
## [1] "The Multiplicative Model"
## 
## ####################################### 
## # KPSS Unit Root / Cointegration Test # 
## ####################################### 
## 
## The value of the test statistic is: 0.069
## [1] "Linear Regression"
## 
## ####################################### 
## # KPSS Unit Root / Cointegration Test # 
## ####################################### 
## 
## The value of the test statistic is: 0.0142

the multiplive model is significant at alpha level = .10, therefore I will use the addtive decomposition for arima and arima regressors models.

the linear regression model residuals are stationary therefore, the residuals use for arima model and they combined in the end.

the regressors mentioned above is used for arima model with regressors.

Arima

## Series: decomposed$random 
## ARIMA(5,0,0) with zero mean 
## 
## Coefficients:
##           ar1      ar2      ar3      ar4      ar5
##       -0.1935  -0.4373  -0.4021  -0.3274  -0.1575
## s.e.   0.0504   0.0485   0.0492   0.0483   0.0502
## 
## sigma^2 estimated as 5.192:  log likelihood=-859.14
## AIC=1730.28   AICc=1730.51   BIC=1753.99

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(5,0,0) with zero mean
## Q* = 54.422, df = 9, p-value = 1.569e-08
## 
## Model df: 5.   Total lags used: 14

Arima with Regressor

## Series: decomposed$random 
## Regression with ARIMA(0,0,0)(0,0,2)[7] errors 
## 
## Coefficients:
##         sma1     sma2  intercept    xreg
##       0.1989  -0.1040    -0.3089  0.0013
## s.e.  0.0505   0.0501     0.2072  0.0006
## 
## sigma^2 estimated as 6.874:  log likelihood=-913.24
## AIC=1836.49   AICc=1836.65   BIC=1856.24

## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(0,0,0)(0,0,2)[7] errors
## Q* = 100.13, df = 10, p-value < 2.2e-16
## 
## Model df: 4.   Total lags used: 14

Arima combined with linear Regression

## Series: residuals 
## ARIMA(1,0,4) with zero mean 
## 
## Coefficients:
##          ar1      ma1     ma2      ma3      ma4
##       0.7355  -0.7514  0.1745  -0.1317  -0.2326
## s.e.  0.0615   0.0740  0.0651   0.0736   0.0599
## 
## sigma^2 estimated as 1.142:  log likelihood=-577.29
## AIC=1166.59   AICc=1166.81   BIC=1190.39

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,4) with zero mean
## Q* = 8.9276, df = 5, p-value = 0.112
## 
## Model df: 5.   Total lags used: 10

Predictions

The all models are used to predict include mul_arima and reg_mul_arima since they significant at alpha = 0.10.

##     event_date actual sqrt_forecasted_sold BoxCox_forecasted_sold
##  1: 2021-06-19      0                    0                      0
##  2: 2021-06-20      1                    2                      3
##  3: 2021-06-21      2                    2                      3
##  4: 2021-06-22      2                    1                      0
##  5: 2021-06-23      2                    1                      0
##  6: 2021-06-24      2                    1                      0
##  7: 2021-06-25      2                    1                      0
##  8: 2021-06-26      1                    0                      0
##  9: 2021-06-27      0                    0                      0
## 10: 2021-06-28      4                    1                      0
## 11: 2021-06-29      1                    3                      6
## 12: 2021-06-30      0                    0                      0
## 13: 2021-07-01      1                    1                      1
## 14: 2021-07-02      2                    2                      2
##     lm_forecasted_sold forecasted_lm8_arima add_arima_forecasted
##  1:                 -1                   -1                    2
##  2:                  1                    2                    3
##  3:                  1                    2                    3
##  4:                  1                    0                    2
##  5:                  1                    1                    2
##  6:                  0                    0                    2
##  7:                  0                    0                    1
##  8:                  0                    1                    1
##  9:                 -1                    0                    1
## 10:                  1                    1                    1
## 11:                  2                    2                    2
## 12:                 -1                   -1                    2
## 13:                  2                    2                    3
## 14:                  1                    1                    1
##     mul_arima_forecasted reg_add_arima_forecasted reg_mul_arima_forecasted
##  1:                    2                        2                        0
##  2:                    1                        3                        0
##  3:                    2                        3                        5
##  4:                    2                        2                        5
##  5:                    2                        2                        3
##  6:                    2                        2                       -1
##  7:                    1                        1                        0
##  8:                    1                        1                        1
##  9:                    1                        1                        1
## 10:                    1                        1                        1
## 11:                    2                        2                        2
## 12:                    2                        2                        6
## 13:                    2                        3                        0
## 14:                    1                        1                        4

EROR Rates

##                       model  n     mean      sd        CV FBias MAPE     RMSE
## 1:     sqrt_forecasted_sold 14 1.428571 1.08941 0.7625867  0.25  NaN 1.164965
## 2:   BoxCox_forecasted_sold 14 1.428571 1.08941 0.7625867  0.25  NaN 2.121320
## 3:       lm_forecasted_sold 14 1.428571 1.08941 0.7625867  0.65  Inf 1.388730
## 4:     forecasted_lm8_arima 14 1.428571 1.08941 0.7625867  0.50  NaN 1.414214
## 5:     add_arima_forecasted 14 1.428571 1.08941 0.7625867 -0.30  Inf 1.463850
## 6:     mul_arima_forecasted 14 1.428571 1.08941 0.7625867 -0.10  Inf 1.253566
## 7: reg_add_arima_forecasted 14 1.428571 1.08941 0.7625867 -0.30  Inf 1.463850
## 8: reg_mul_arima_forecasted 14 1.428571 1.08941 0.7625867 -0.35  NaN 2.464027
##          MAD MADP WMAPE
## 1: 0.7857143 0.55  0.55
## 2: 1.5000000 1.05  1.05
## 3: 1.2142857 0.85  0.85
## 4: 1.1428571 0.80  0.80
## 5: 1.1428571 0.80  0.80
## 6: 0.8571429 0.60  0.60
## 7: 1.1428571 0.80  0.80
## 8: 1.9285714 1.35  1.35

The error rates are very high, however the range of response variable too narrow, therefore, it is expected. Like if the sales = 1 and the prediction is equal= 2 the error rate will be %100.

The mul_arima_forecasted has the lowest error rate.

Next Day Prediction

In every day, the error rates are calculated for last 14 days and the model predictions and the model prediction has the lowest WMAPE value of is selected.

##           add_arima           mul_arima      xreg_mul_arima      xreg_add_arima 
##           1.5234013           0.4952038           1.1665486           1.5962115 
##         forecast_lm forecast_lm_arima.1           BoxCox_lm             Sqrt_lm 
##           0.6147960           0.3585320           1.6999947           2.0000000

Product9 -TrendyolMilla Bikini Top

By observing the graph below, the month effect is clearly observable. It is expected since bikini is wore in hot seasons in Turkey. Moreover, by examined the acf and pacf graph, it can be said that there is trend in data and correlation with lag1 and lag7.

the “price”,“category_sold”, “basket_count”,“category_favored” attributes are more relaible and significantly corralet with data. Even if the visit_count and favored_count is very high corraleted with data, they also corraleted with basket_count therefore they do not used in regressors.

##      price         event_date         product_content_id   sold_count    
##  Min.   :59.99   Min.   :2020-05-25   Length:405         Min.   :  0.00  
##  1st Qu.:59.99   1st Qu.:2020-09-03   Class :character   1st Qu.:  0.00  
##  Median :59.99   Median :2020-12-13   Mode  :character   Median :  0.00  
##  Mean   :60.11   Mean   :2020-12-13                      Mean   : 18.35  
##  3rd Qu.:59.99   3rd Qu.:2021-03-24                      3rd Qu.:  3.00  
##  Max.   :63.55   Max.   :2021-07-03                      Max.   :286.00  
##  NA's   :281                                                             
##   visit_count    favored_count     basket_count     category_sold 
##  Min.   :    0   Min.   :   0.0   Min.   :   0.00   Min.   :  20  
##  1st Qu.:    0   1st Qu.:   0.0   1st Qu.:   0.00   1st Qu.: 132  
##  Median :    0   Median :   0.0   Median :   0.00   Median : 563  
##  Mean   : 2457   Mean   : 240.8   Mean   :  88.64   Mean   :1301  
##  3rd Qu.:  589   3rd Qu.: 112.0   3rd Qu.:  19.00   3rd Qu.:1676  
##  Max.   :45833   Max.   :5011.0   Max.   :1735.00   Max.   :8099  
##                                                                   
##  category_brand_sold category_visits     ty_visits         category_basket  
##  Min.   :     0      Min.   :    107   Min.   :        1   Min.   :      0  
##  1st Qu.:     0      1st Qu.:    397   1st Qu.:        1   1st Qu.:      0  
##  Median :  2965      Median :   1362   Median :        1   Median :      0  
##  Mean   : 14028      Mean   :  82604   Mean   : 44737307   Mean   : 118415  
##  3rd Qu.: 15079      3rd Qu.:   2871   3rd Qu.:102143446   3rd Qu.: 101167  
##  Max.   :152168      Max.   :1335060   Max.   :178545693   Max.   :1230833  
##                                                                             
##  category_favored     w_day            mon          is_campaign     
##  Min.   :   628   Min.   :1.000   Min.   : 1.000   Min.   :0.00000  
##  1st Qu.:  2589   1st Qu.:2.000   1st Qu.: 4.000   1st Qu.:0.00000  
##  Median :  7843   Median :4.000   Median : 6.000   Median :0.00000  
##  Mean   : 15287   Mean   :4.007   Mean   : 6.464   Mean   :0.08642  
##  3rd Qu.: 16401   3rd Qu.:6.000   3rd Qu.: 9.000   3rd Qu.:0.00000  
##  Max.   :135551   Max.   :7.000   Max.   :12.000   Max.   :1.00000  
## 

The trend, lag1,lag2,lag3, and lag7 variables are added data.

model construction

the data has no constant variance therefore, besides the simple linear model, the sqrt transformation and boxcox tranformation is used for simple regression model

In product9, attributes are reliable therefore the all attributes are tried to add model and most significance ones selected for the model .

simple linear regression with no transformation

## 
## Call:
## lm(formula = sold_count ~ price + visit_count + basket_count + 
##     favored_count + category_sold + category_visits + category_basket + 
##     category_favored + category_brand_sold + factor(w_day) + 
##     factor(mon) + trend + lag1 + lag3, data = train9)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -25.248  -1.112  -0.030   1.443  31.628 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -2.410e+02  1.085e+02  -2.221 0.026981 *  
## price                4.057e+00  1.816e+00   2.234 0.026091 *  
## visit_count         -1.082e-03  6.195e-04  -1.747 0.081498 .  
## basket_count         2.038e-01  7.661e-03  26.600  < 2e-16 ***
## favored_count       -4.815e-03  4.388e-03  -1.097 0.273267    
## category_sold        5.148e-03  8.736e-04   5.893 8.74e-09 ***
## category_visits      3.453e-06  8.701e-06   0.397 0.691717    
## category_basket      3.348e-05  1.498e-05   2.234 0.026105 *  
## category_favored    -3.223e-04  8.046e-05  -4.005 7.53e-05 ***
## category_brand_sold -1.555e-04  1.243e-04  -1.251 0.211659    
## factor(w_day)2      -1.574e+00  1.128e+00  -1.396 0.163695    
## factor(w_day)3       7.633e-01  1.149e+00   0.664 0.506889    
## factor(w_day)4       1.103e-01  1.151e+00   0.096 0.923689    
## factor(w_day)5       3.665e-02  1.152e+00   0.032 0.974643    
## factor(w_day)6      -2.088e-01  1.136e+00  -0.184 0.854199    
## factor(w_day)7       5.030e-01  1.137e+00   0.442 0.658438    
## factor(mon)2        -7.007e+00  1.823e+00  -3.842 0.000144 ***
## factor(mon)3        -7.194e+00  1.758e+00  -4.092 5.29e-05 ***
## factor(mon)4        -7.014e+00  1.985e+00  -3.534 0.000463 ***
## factor(mon)5        -9.773e+00  3.846e+00  -2.541 0.011477 *  
## factor(mon)6        -6.645e+00  3.545e+00  -1.874 0.061692 .  
## factor(mon)7        -3.371e+00  3.161e+00  -1.066 0.286926    
## factor(mon)8        -5.443e-01  2.762e+00  -0.197 0.843853    
## factor(mon)9        -1.190e+00  2.519e+00  -0.472 0.636947    
## factor(mon)10       -2.114e+00  2.274e+00  -0.930 0.353123    
## factor(mon)11       -1.897e+00  2.053e+00  -0.924 0.355932    
## factor(mon)12       -1.069e+00  1.671e+00  -0.640 0.522666    
## trend               -6.034e-03  1.412e-02  -0.427 0.669394    
## lag1                 9.079e-02  2.380e-02   3.814 0.000161 ***
## lag3                 7.836e-02  1.787e-02   4.385 1.53e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.85 on 360 degrees of freedom
## Multiple R-squared:  0.9861, Adjusted R-squared:  0.985 
## F-statistic: 880.1 on 29 and 360 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 33
## 
## data:  Residuals
## LM test = 166.95, df = 33, p-value < 2.2e-16

The Adjusted R-squared value is very high and residuals seems to no autocorraled arround the mean zero. The model is a can be good fit.

Simple Linear Regression Model with sqrt transformation

## 
## Call:
## lm(formula = sqrt ~ price + visit_count + basket_count + favored_count + 
##     category_sold + category_visits + category_basket + category_favored + 
##     category_brand_sold + ty_visits + factor(w_day) + factor(mon) + 
##     lag1 + lag3, data = train9)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3358 -0.2404 -0.0610  0.1759  4.8481 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -1.256e+01  1.292e+01  -0.972  0.33175    
## price                2.086e-01  2.149e-01   0.970  0.33246    
## visit_count          4.802e-05  7.202e-05   0.667  0.50537    
## basket_count         1.009e-02  9.288e-04  10.862  < 2e-16 ***
## favored_count       -6.224e-04  4.932e-04  -1.262  0.20779    
## category_sold        5.387e-04  1.068e-04   5.046 7.19e-07 ***
## category_visits      2.765e-06  8.455e-07   3.271  0.00118 ** 
## category_basket      3.112e-06  1.908e-06   1.631  0.10379    
## category_favored    -4.993e-05  9.264e-06  -5.390 1.28e-07 ***
## category_brand_sold -5.552e-06  1.569e-05  -0.354  0.72361    
## ty_visits            1.457e-08  3.008e-09   4.843 1.90e-06 ***
## factor(w_day)2       1.471e-01  1.361e-01   1.081  0.28039    
## factor(w_day)3       2.979e-01  1.380e-01   2.158  0.03156 *  
## factor(w_day)4       3.219e-01  1.386e-01   2.323  0.02074 *  
## factor(w_day)5       3.376e-01  1.387e-01   2.435  0.01539 *  
## factor(w_day)6       3.561e-01  1.369e-01   2.602  0.00966 ** 
## factor(w_day)7       2.701e-01  1.364e-01   1.980  0.04851 *  
## factor(mon)2        -9.424e-02  3.247e-01  -0.290  0.77180    
## factor(mon)3        -1.099e+00  2.770e-01  -3.967 8.80e-05 ***
## factor(mon)4        -2.480e+00  2.941e-01  -8.433 8.28e-16 ***
## factor(mon)5        -9.449e-01  3.217e-01  -2.937  0.00352 ** 
## factor(mon)6        -2.390e-01  2.775e-01  -0.861  0.38974    
## factor(mon)7        -3.521e-02  2.690e-01  -0.131  0.89593    
## factor(mon)8         1.451e-01  2.327e-01   0.624  0.53322    
## factor(mon)9        -5.139e-02  2.249e-01  -0.228  0.81939    
## factor(mon)10       -2.160e-01  2.243e-01  -0.963  0.33615    
## factor(mon)11       -2.139e-01  2.254e-01  -0.949  0.34312    
## factor(mon)12       -1.965e-01  1.960e-01  -1.003  0.31662    
## lag1                 7.882e-03  2.876e-03   2.741  0.00644 ** 
## lag3                 3.603e-03  2.123e-03   1.697  0.09052 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7076 on 360 degrees of freedom
## Multiple R-squared:  0.9688, Adjusted R-squared:  0.9662 
## F-statistic: 384.9 on 29 and 360 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 33
## 
## data:  Residuals
## LM test = 177.51, df = 33, p-value < 2.2e-16

The sqrt tranformation is also good fit model by R-squared value and residual analysis, However, it has lower R-squared value than no transformation model.

BoxCox Transformation

## 
## Call:
## lm(formula = BoxCox ~ price + visit_count + basket_count + favored_count + 
##     category_visits + category_basket + ty_visits + factor(w_day) + 
##     factor(mon) + lag1 + lag3, data = train9)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.4824 -0.4164 -0.0942  0.4179  7.3159 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -4.153e+00  2.520e+01  -0.165  0.86917    
## price            5.291e-04  4.193e-01   0.001  0.99899    
## visit_count      2.403e-04  1.323e-04   1.816  0.07022 .  
## basket_count     5.456e-03  1.676e-03   3.255  0.00124 ** 
## favored_count   -1.583e-03  7.530e-04  -2.102  0.03626 *  
## category_visits  2.171e-06  9.660e-07   2.247  0.02521 *  
## category_basket  3.101e-06  1.087e-06   2.853  0.00459 ** 
## ty_visits        3.526e-08  5.591e-09   6.307 8.28e-10 ***
## factor(w_day)2   3.732e-01  2.661e-01   1.402  0.16164    
## factor(w_day)3   5.536e-01  2.668e-01   2.075  0.03865 *  
## factor(w_day)4   7.663e-01  2.673e-01   2.867  0.00438 ** 
## factor(w_day)5   8.076e-01  2.664e-01   3.032  0.00261 ** 
## factor(w_day)6   7.532e-01  2.663e-01   2.828  0.00494 ** 
## factor(w_day)7   5.979e-01  2.669e-01   2.240  0.02569 *  
## factor(mon)2     1.031e+00  6.301e-01   1.637  0.10257    
## factor(mon)3    -8.714e-01  5.402e-01  -1.613  0.10759    
## factor(mon)4    -5.120e+00  5.691e-01  -8.997  < 2e-16 ***
## factor(mon)5    -1.495e+00  5.203e-01  -2.874  0.00430 ** 
## factor(mon)6    -1.056e-01  3.524e-01  -0.300  0.76452    
## factor(mon)7    -6.734e-01  3.545e-01  -1.900  0.05825 .  
## factor(mon)8    -6.263e-01  3.544e-01  -1.767  0.07803 .  
## factor(mon)9    -6.531e-01  3.574e-01  -1.828  0.06844 .  
## factor(mon)10   -6.609e-01  3.543e-01  -1.866  0.06291 .  
## factor(mon)11   -6.200e-01  3.572e-01  -1.736  0.08344 .  
## factor(mon)12   -6.594e-01  3.545e-01  -1.860  0.06364 .  
## lag1             9.810e-03  5.592e-03   1.754  0.08020 .  
## lag3             2.617e-03  4.146e-03   0.631  0.52842    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.386 on 363 degrees of freedom
## Multiple R-squared:  0.9192, Adjusted R-squared:  0.9134 
## F-statistic: 158.9 on 26 and 363 DF,  p-value: < 2.2e-16

## 
##  Breusch-Godfrey test for serial correlation of order up to 30
## 
## data:  Residuals
## LM test = 169.2, df = 30, p-value < 2.2e-16

BoxCox transformation is also can be good fit model since the adjusted R-square value high.

In all lm models the residuals is significantly corraleted in lag1 it is not desirable.

Arima Models

When arima models is constructed, the auto.arima function is used, and in every day the auto.arima function is runs again. the seasonality is TRUE, and frequency is determined as seven by observing ACF and PACF graph.

Additive Model, Multplive Model, and linear regression model is used for decomposition and get stationary data.

## [1] "The Additive Model"
## 
## ####################################### 
## # KPSS Unit Root / Cointegration Test # 
## ####################################### 
## 
## The value of the test statistic is: 0.0082
## [1] "The Multiplicative Model"
## 
## ####################################### 
## # KPSS Unit Root / Cointegration Test # 
## ####################################### 
## 
## The value of the test statistic is: 0.083
## [1] "Linear Regression"
## 
## ####################################### 
## # KPSS Unit Root / Cointegration Test # 
## ####################################### 
## 
## The value of the test statistic is: 0.0267

I used the addtive model in examination, however, the mul model is also used in predictions and calculated error rate since it is significant at level = 0.05

Arima

## Series: decomposed$random 
## ARIMA(0,0,2)(0,0,2)[7] with zero mean 
## 
## Coefficients:
##          ma1     ma2    sma1    sma2
##       0.0166  -0.210  0.1241  0.1177
## s.e.  0.0661   0.076  0.0562  0.0586
## 
## sigma^2 estimated as 101.9:  log likelihood=-1430.87
## AIC=2871.74   AICc=2871.9   BIC=2891.49

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,2)(0,0,2)[7] with zero mean
## Q* = 61.582, df = 10, p-value = 1.816e-09
## 
## Model df: 4.   Total lags used: 14

Arima with Regressor

## Series: decomposed$random 
## Regression with ARIMA(0,0,2)(1,0,2)[7] errors 
## 
## Coefficients:
##           ma1      ma2     sar1    sma1    sma2  intercept     xreg
##       -0.0805  -0.3896  -0.8042  0.8776  0.2227   355.0113  -5.9076
## s.e.   0.0845   0.1029   0.0895  0.0983  0.0595   144.1929   2.3990
## 
## sigma^2 estimated as 99.44:  log likelihood=-1425.37
## AIC=2866.75   AICc=2867.13   BIC=2898.35

## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(0,0,2)(1,0,2)[7] errors
## Q* = 74.376, df = 7, p-value = 1.92e-13
## 
## Model df: 7.   Total lags used: 14

Arima comined with Linear Regression

## Series: residuals 
## ARIMA(1,0,0) with zero mean 
## 
## Coefficients:
##          ar1
##       0.1640
## s.e.  0.0499
## 
## sigma^2 estimated as 30.82:  log likelihood=-1221.38
## AIC=2446.77   AICc=2446.8   BIC=2454.7

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0) with zero mean
## Q* = 17.102, df = 9, p-value = 0.04714
## 
## Model df: 1.   Total lags used: 10

The all arima models have no significant corraleted residuals around zero. The arima model combined with linear regression is the lowest AIC value, therefore, it can be the best fit model.

Predictions

##     event_date actual sqrt_forecasted_sold BoxCox_forecasted_sold
##  1: 2021-06-19     26             40.71978              23.843419
##  2: 2021-06-20     15             37.83503              29.940431
##  3: 2021-06-21     20             18.10912              12.391943
##  4: 2021-06-22     47             18.99811              10.866454
##  5: 2021-06-23     40             23.85368              16.080255
##  6: 2021-06-24     37             22.29100              15.408298
##  7: 2021-06-25     20             21.12638              11.809304
##  8: 2021-06-26     27             15.25724               7.735327
##  9: 2021-06-27     20             29.69443              23.901945
## 10: 2021-06-28     26             16.28321              13.350304
## 11: 2021-06-29     19             29.98746              31.296925
## 12: 2021-06-30     20             29.00951              29.632218
## 13: 2021-07-01     14             20.42428              15.870863
## 14: 2021-07-02      8             14.63988              10.540106
##     lm_forecasted_sold forecasted_lm9_arima add_arima_forecasted
##  1:           53.60680             51.82296             53.17379
##  2:           25.63597             21.51158             55.07792
##  3:           28.82212             31.34634             50.73318
##  4:           48.30746             42.48172             39.35532
##  5:           41.30164             42.75290             40.53021
##  6:           38.18430             35.96907             37.48216
##  7:           32.00696             34.95751             33.04259
##  8:           26.10417             23.68023             28.24629
##  9:           18.14258             20.09614             31.74235
## 10:           18.17077             16.38016             32.17666
## 11:           29.19542             30.60067             28.66079
## 12:           32.02036             29.43861             25.90133
## 13:           27.17432             26.75186             24.36589
## 14:           16.04085             11.84275             20.95494
##     mul_arima_forecasted reg_add_arima_forecasted reg_mul_arima_forecasted
##  1:             38.06669                 53.13962                 37.64733
##  2:             74.23554                 55.08468                 74.99982
##  3:             37.16847                 50.77435                 46.35113
##  4:             32.56984                 39.42061                 31.36196
##  5:             53.92517                 40.56712                 54.00017
##  6:             38.03248                 37.93574                 39.40081
##  7:             27.91288                 33.47467                 29.19652
##  8:             23.01908                 28.63710                 23.45338
##  9:             40.43471                 32.14315                 41.18473
## 10:             22.61740                 32.60195                 23.04039
## 11:             23.99802                 29.05578                 24.46084
## 12:             35.12226                 26.29765                 35.80254
## 13:             24.92684                 24.74094                 25.39693
## 14:             18.02096                 21.34210                 18.35011

EROR RATES

##                       model  n     mean       sd       CV        FBias
## 1:     sqrt_forecasted_sold 14 24.21429 10.72867 0.443072  0.002274018
## 2:   BoxCox_forecasted_sold 14 24.21429 10.72867 0.443072  0.254667279
## 3:       lm_forecasted_sold 14 24.21429 10.72867 0.443072 -0.282341375
## 4:     forecasted_lm9_arima 14 24.21429 10.72867 0.443072 -0.237854005
## 5:     add_arima_forecasted 14 24.21429 10.72867 0.443072 -0.479184122
## 6:     mul_arima_forecasted 14 24.21429 10.72867 0.443072 -0.445576199
## 7: reg_add_arima_forecasted 14 24.21429 10.72867 0.443072 -0.490311090
## 8: reg_mul_arima_forecasted 14 24.21429 10.72867 0.443072 -0.488633222
##         MAPE     RMSE       MAD      MADP     WMAPE
## 1: 0.5176652 13.67392 11.688884 0.4827268 0.4827268
## 2: 0.4853113 15.80493 12.621227 0.5212306 0.5212306
## 3: 0.4582578 10.87023  8.348478 0.3447749 0.3447749
## 4: 0.4219104 10.67494  8.400723 0.3469325 0.3469325
## 5: 0.7234923 17.11155 12.695198 0.5242855 0.5242855
## 6: 0.7644155 19.51808 13.902693 0.5741525 0.5741525
## 7: 0.7378675 17.23471 12.955303 0.5350273 0.5350273
## 8: 0.8187681 20.61137 14.995371 0.6192779 0.6192779

Linear Regression model with no transformation has the lowest WAMPE value therefore, it is selected for best fit model ,however,

In every day, the error rates are calculated for last 14 days and the model predictions and the model prediction has the lowest WMAPE value of is selected.

Next Day Prediction

##           add_arima           mul_arima      xreg_mul_arima      xreg_add_arima 
##           18.701038           22.265567           22.686007           19.097003 
##         forecast_lm forecast_lm_arima.1           BoxCox_lm             Sqrt_lm 
##           14.047509           10.743843            8.938846           12.659273

CONCLUSION

In order to predict one day ahead sales of the different products, different ARIMA and Linear Regression models have been tried and according to their performance on the test set, which consists of dates from 29 May 2021 to 11 June 2021, different models have been selected for each product. As external data, campaign dates of Trendyol is included, however since every campaign that of Trendyol is not included in the website, some of the outlier may have not been explained more correctly in the models, in order to improve the models, further investigation may be held. Also, the sales are affected from the overall component of the economy, so more external data could be included such as dollar exchange rate, for improved accuracy.

Approaching differently to each product is one of the strong sides of the model, since it is a time consuming task. Also trying various models and measuring their performances based on their predictions on the test data is also a strong side of the models that have been proposed for each product.

Overall, it can be said that models work fine, deviation from the real values is not too big.

REFERENCES

Lecture Notes

RMD

The code of my study is available from here